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escribing  action  strategies,  and  the  newly  designed  tool;  investigated  the 
ossibility  of  moving  the  burden  of  developing  correct  programs  from  the 
uman  programmer  to  the  agent  itself  through  the  use  of  algorithms  that  alloi 
he  agent  to  learn  from  trial  and  error;  applied  the  principles  of  situated- 
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1  Objectives  of  the  Research  Effort 

Teleos  Research,  under  the  sponsorship  of  the  Air  Force  Office  of  Scientific  Research,  has 
carried  out  a  two-year  program  of  research  on  “The  Synthesis  of  Intelligent  Real-Time 
Systems.”  The  purpose  of  the  effort  was  to  develop  and  extend  theories  and  techniques 
that  facilitate  the  design  and  implementation  of  intelligent  real-time  systems.  These  are 
embedded  computer  systems  that  are  linked  to  the  external  world  through  sensors  and 
effectors  and  are  programmed  to  interpret  sensor  data  and  to  produce  flexible,  goal-directed 
behavior  continuously  and  in  real  time.  Systems  of  this  kind  will  be  of  crucial  importance 
to  a  wide  variety  of  military  and  industrial  applications,  including  robotics,  process  control, 
real-time  situation  monitoring,  and  space  applications.  If  this  potential  is  to  be  realized, 
better  techniques  will  be  required  for  producing  the  sophisticated  software  that  lies  at  the 
heart  of  such  systems. 

In  previous  research,  Teleos  personnel  h  ive  developed  situated  automata  theory,  a  new 
approach  toward  modeling  and  programming  intelligent  real-time  systems.  This  approach 
combines  the  flexibility  of  symbolic  reasoning  systems  with  the  performance  of  real-time 
control  systems.  By  identifying  and  encapsulating  high-level  abstractions,  it  is  possible 
to  raise  the  conceptual  level  at  which  such  systems  are  programmed  and  to  improve  the 
efficiency  of  the  programmer,  as  well  as  the  capabilities  of  the  target  system. 

The  objectives  of  the  research  effort  carried  out  for  the  AFOSR  were  to  test  and  extend 
this  existing  approach  in  the  following  ways: 

•  Extend  situated-automata  theory  to  apply  to  situations  in  which  the  system  has 
probabilistic  information  about  the  world; 

•  Design  and  build  a  high-level,  declarative  programming  tool  for  synthesizing  efficient 
programs  that  track  dynamic  conditions  in  the  world; 

•  Understand  the  theoretical  relationships  between  Gapps,  an  existing  declarative  pro¬ 
gramming  tool  for  describing  action  strategies,  and  the  newly  designed  tool; 

•  Explore  the  possibility  of  moving  the  burden  of  developing  correct  programs  from 
the  human  programmer  to  the  agent  itself  through  the  use  of  algorithms  that  allow 
the  agent  to  learn  from  trial  and  error; 

•  Apply  the  principles  of  situated-automata  theory  to  the  understanding  of  existing 
vision  algorithms  and  the  development  of  new  ones;  and 

•  Test  theoretical  principles  and  design  tools  in  a  real  robotic  domain. 

The  combined  progress  in  each  of  these  areas  has  allowed  us  to  take  significant  steps 
toward  an  ideal  situation  in  which  (1)  a  programmer  gives  a  high-level,  declarative  spec¬ 
ification  of  the  dynamics  of  the  world  and  of  the  intended  behavior  of  the  agent;  (2)  the 
specification  is  compiled  into  highly  efficient  code  for  a  real  robotic  agent  with  complex 
effectors  and  visual  sensors;  and  (3)  the  agent  acts  in  the  real  world,  filling  out  the  details 
and  correcting  errors  in  its  specification  by  learning  from  its  experience  in  the  real  world. 
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2  Status  of  the  Research  Effort 


In  this  section  we  will  summarize  our  progress  in  each  of  the  areas  described  above  and 
provide  references  to  more  detailed  accounts,  which  have  been  included  as  appendices  to 
this  report. 

•  We  finished  the  design  and  implementation  of  Ruler,  a  declarative  language  from 
which  efficient  declarative  programs  can  be  synthesized.  Ruler  allows  the  user  to 
specify  the  behavior  of  the  world  using  Prolog-like  rules.  The  user  then  selects 
particular  properties  of  the  world  that  should  be  tracked  and  the  compiler,  using 
backward-chaining  proof  techniques,  generates  a  circuit  that  tracks  the  desired  prop¬ 
erties  in  the  world.  This  system  and  its  theoretical  foundations  are  discussed  in  detail 
in  Appendix  A. 

•  Stanley  Rosenschein  and  Michal  Irani  (a  visiting  student)  developed  a  probabilistic 
formulation  of  situated-automata  theory,  including  a  calculus  for  the  combination 
of  probabilistic  information.  In  the  deterministic  version  of  situated-automata  the¬ 
ory,  the  information  content  of  an  agent’s  state  is  modeled  in  terms  of  environment 
conditions  implied  by  the  agent’s  being  in  that  state.  The  theory  can  be  generalized- 
and  its  utility  extended-by  allowing  situations  in  which  the  correlation  between  the 
agent’s  state  and  the  environment  is  probabilistic  rather  than  deterministic.  Unfortu¬ 
nately,  because  of  the  non-monotonic  nature  of  probabilities,  this  extension  does  not 
immediately  yield  a  compositional  design  methodology;  a  condition  which  is  highly 
probable  given  the  state  of  a  component  may  be  highly  improbable  given  the  states 
of  other  components.  To  overcome  this  problem,  we  developed  an  approach  based  on 
stable  probabilities  where  the  designer  makes  assertions  that  bound  the  probability  of 
certain  propositions  given  partial  information  about  the  agent’s  state.  A  prototype 
design  tool  was  implemented  in  Prolog  that,  like  Ruler,  derived  circuits  composi- 
tionally  from  declarative  specifications,  but  unlike  Rider  was  grounded  in  the  more 
general  probabilistic  model  of  information. 

•  There  is  an  elegant  theoretical  relationship  between  the  notion  of  a  “goal,”  as  used  in 
the  Gapps  language,  and  the  notion  of  “information,”  as  used  in  the  Ruler  language. 
If  an  agent  can  be  thought  of  as  having  a  single  overall  goal,  N ,  then  for  it  to  have 
another  goal,  P,  can  be  modeled  as  the  agent’s  having  the  information  that  P  implies 
N.  This  duality  of  goals  and  information  is  the  subject  of  a  more  detailed  discussion 
presented  in  Appendix  B. 

In  addition  to  this  theoretical  work,  the  Rex  and  Gapps  languages  were  extended  and 
enhanced  in  a  variety  of  ways.  In  order  to  support  robotic  experimentation  and  more 
efficient  debugging,  new  code  generators  that  generate  Lisp  and  C  as  output  were 
added  to  the  Rex  compiler.  The  Gapps  compilation  process  was  sped  up  significantly 
through  the  addition  of  a  caching  mechanism.  Finally,  a  new  construct  was  added 
to  the  Rex  language  that  allows  the  execution  of  individual  program  modules  to  be 
triggered  by  other  conditions  in  the  program.  This  construct  increases  the  efficiency 
of  programs  that  have  modules  whose  results  are  used  only  occasionally. 
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•  The  topic  of  reinforcement  learning,  or  learning  from  trial  and  error,  was  studied 
extensively  by  Leslie  Kaelbling.  The  results  of  this  study  included  a  number  of  new 
algorithms  for  efficient  reinforcement  learning  in  embedded  agents  and  culminated 
in  a  demonstration  of  these  techniques  on  a  small  mobile  robot.  The  foundations 
of  this  approach  to  learning  are  described  in  Appendix  C  and  some  algorithms  for 
efficiently  learning  Boolean  functions  in  fc-DNF  from  reinforcement  are  described  in 
Appendix  D.  In  addition,  Kaelbling’s  Ph.D.  thesis  on  this  topic  included  under 
separate  cover. 

•  We  studied  a  variety  of  existing  algorithms  for  machine  vision.  In  particular,  we  used 
situated-automata-theoretic  techniques  to  characterize  several  model-based  vision  al¬ 
gorithms,  including  Goad’s  method  and  Grimson’s  method,  both  of  which  match 
model  features  to  image  features  to  generate  and  filter  hypotheses  about  object  iden¬ 
tity  and  to  refine  information  about  object  parameters.  While  these  investigations 
yielded  a  better  understanding  of  the  logical  basis  of  these  particular  algorithms,  they 
did  not  immediately  suggest  how  to  generalize  the  algorithms  to  less  constrained  do¬ 
mains  nor  how  to  compile  declarative  descriptions  of  such  domains  into  perceptual 
recognition  circuitry.  These  remain  important  topics  for  future  research. 

In  addition,  David  Chapman  extended  his  work  on  the  design  of  a  set  of  visual 
primitives  and  routines  over  those  primitives  that  can  be  used  to  support  reactive 
behavior  in  embedded  agents.  Appendix  E  describes  his  work  in  detail. 

•  Finally,  we  designed  an  architecture  for  a  complex  robotic  demonstration  system. 
The  specification  for  the  architecture,  included  as  Appendix  F,  includes  descriptions 
of  a  class  of  demonstration  tasks,  an  efficient  symbolic  database  and  query  language, 
and  an  interface  to  low-level  machine-vision  tools.  A  large  part  of  the  specifica¬ 
tion  was  implemented,  and  the  rest  remains  to  be  carried  out  under  future  research 
projects. 
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3  Publications 


•  H.  Keith  Nishihara,  “Tests  of  a  Sign  Correlation  Model  for  binocular  Stereo,”  Inves¬ 
tigative  Ophthalmology  and  Visual  Science ,  vol.  30,  no.  3,  March,  1989. 

•  H.  Keith  Nishihara,  “Psychophysical  and  Computational  Tests  Comparing  the  Sign- 
Correlation  and  Zero-Crossing  Models  of  Human  Stereo  Vision,”  Image  Understand¬ 
ing  and  Machine  Vwion,  1989  Technical  Digest  Series,  1J,  Optical  Society  of  Amer¬ 
ica,  Washington,  D.C.,  1989. 

•  Stanley  J.  Rosenschein,  “Synthesizing  Information- Tracking  Automata  from  Envi¬ 
ronment  Descriptions,”  in  Proceedings  of  the  First  Annual  Conference  on  Principles 
of  Knowledge  Representation ,  Toronto,  Canada,  May,  1989. 

•  Stanley  J.  Rosenschein,  “Synthesizing  Information-Update  Functions  Using  Off-Line 
Symbolic  Processing,”  in  Proceedings  of  the  Society  of  Photo- Optical  Instrumentation 
Engineers  Symposium  on  Advances  in  Intelligent  Robotics  Systems ,  Philadelphia, 
Pennsylvania,  1989. 

•  Stanley  J.  Rosenschein  and  Leslie  Pack  Kaelbling,  “Integrating  Planning  and  Re¬ 
active  Control,”  in  Proceedings  of  the  NASA/JPL  Space  Telerobotics  Conference , 
Pasadena,  California,  1989. 

•  Leslie  Pack  Kaelbling,  “A  Formal  FVamework  for  Learning  in  Embedded  Systems,” 
in  Proceedings  of  the  Sixth  International  Workshop  on  Machine  Learning,  Ithaca, 
New  York,  1989. 

•  Leslie  Pack  Kaelbling,  Learning  in  Embedded  Systems,  Ph.D.  Dissertation,  Stanford 
University,  1990.  Also  published  as  Teleos  Research  Technical  Report  No.  TR-90-04, 
August,  1990. 

•  Leslie  Pack  Kaelbling  and  Stanley  J.  Rosenschein,  “Action  and  Planning  in  Embed¬ 
ded  Agents,”  in  Robotics  and  Autonomous  Systems ,  vol.  6,  pp.  35-48,  1990.  Also  in 
New  Architectures  for  Autonomous  Agents:  Task-Level  Decomposition  and  Emergent 
Functionality ,  P.  Maes,  Ed.,  MIT  Press  (in  press). 

•  Leslie  Pack  Kaelbling,  “Learning  Functions  in  fc-DNF  from  Reinforcement,”  in  Pro¬ 
ceedings  of  the  Seventh  International  Conference  on  Machine  Learning ,  Austin,  Texas. 
June,  1990. 

•  David  Chapman,  Intermediate  Vision:  Architecture,  Implementation,  and  Use,  Teleos 
Research  Technical  Report  No.  TR-90-06,  October,  1990.  Submitted  to  Cognition. 

•  Leslie  Pack  Kaelbling,  “Generating  Complex  Behavior  for  Computer  Agents,"  in 
Proceedings  of  the  DARPA  Planning  Workshop ,  San  Diego,  California,  November. 
T990. 
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•  Leslie  Pack  Kaelbling,  “Foundations  of  Learning  in  Autonomous  Agents,”  in  Robotics 
and  Autonomous  Systems  (in  press).  Also  in  Toward  Learning  Robots ,  W.  Van  de 
Velde,  Ed.,  MIT  Press  (in  press). 
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4  Personnel 

Supervisory: 


•  Stanley  J.  Rosenschein,  Ph.D.,  1975:  “Structuring  a  Pattern  Space,  With  Applica¬ 
tions  to  Lexical  Information  and  Event  Interpretation.” 

Senior  Professional: 

•  H.  Keith  Nishihara,  Ph.D.,  1978:  “Representation  of  the  Spatial  Organization  of 
Three-Dimensional  Shapes  for  Visual  Recognition.” 

Professional: 

•  David  Chapman,  Ph.D.,  1990:  “Vision,  Instruction,  and  Action.” 

•  Neil  Hunt,  Ph.D.,  1989:  “Tools  for  Image  Processing  and  Computer  Vision.” 

•  Leslie  Pack  Kaelbling,  Ph.D.,  1990:  “Learning  in  Embedded  Systems.” 

•  Jeffrey  R.  Kerr,  Ph.D.,  1985:  “An  Analysis  of  Multi-Fingered  Hands.” 

•  Nathan  J.  Wilson,  M.A.,  1987:  “Developing  a  Computational  Model  of  Biological 
Motion  to  Study  Concept  Formation” 
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5  Interactions 

5.1  Papers  Presented 

•  H.  Keith  Nishihara,  “Tests  of  a  Sign  Correlation  Model  for  Binocular  Stereo,”  Inves¬ 
tigative  Ophthalmology  and  Visual  Science ,  vol.  30,  no.  3,  March,  1989. 

•  H.  Keith  Nishihara,  “Psychophysical  and  Computational  Tests  Comparing  the  Sign- 
Correlation  and  Zero-Crossing  Models  of  Human  Stereo  Vision,”  Image  Understand¬ 
ing  and  Machine  Vision,  1989  Technical  Digest  Series,  lj,  Optical  Society  of  Amer¬ 
ica,  Washington,  D.C.,  1989. 

•  Leslie  Pack  Kaelbling,  “Foundations  of  Learning  in  Autonomous  Agents,”  at  the 
Workshop  on  Representation  and  Learning  in  Autonomous  Agents,  Lagos,  Portugal. 
1988. 

•  Stanley  J.  Rosenschein,  “Synthesizing  Information- Tracking  Automata  from  Envi¬ 
ronment  Descriptions,”  at  the  First  Annual  Conference  on  Principles  of  Knowledge 
Representation,  Toronto,  Canada,  May,  1989. 

•  Stanley  J.  Rosenschein,  “Synthesizing  Information-Update  Functions  Using  Off-Line 
Symbolic  Processing,”  at  the  Society  of  Photo-Optical  Instrumentation  Engineers 
Symposium  on  Advances  in  Intelligent  Robotics  Systems,  Philadelphia,  Pennsylva¬ 
nia,  1989. 

•  Stanley  J.  Rosenschein  and  Leslie  Pack  Kaelbling,  “Integrating  Planning  and  Reac¬ 
tive  Control,”  at  the  NASA/JPL  Space  Telerobotics  Conference,  Pasadena,  Califor¬ 
nia,  1989. 

•  Leslie  Pack  Kaelbling,  “A  Formal  Framework  for  Learning  in  Embedded  Systems,” 
at  the  Sixth  International  Workshop  on  Machine  Learning,  Ithaca,  New  York,  1989. 

•  Leslie  Pack  Kaelbling,  “Learning  Functions  in  fc-DNF  from  Reinforcement,”  at  the 
Seventh  International  Conference  on  Machine  Learning,  Austin,  Texas,  June,  1990. 

•  Leslie  Pack  Kaelbling,  “Generating  Complex  Behavior  for  Computer  Agents,”  at  the 
DARPA  Planning  Workshop,  San  Diego,  California,  October,  1990. 

5.2  Other  Presentations 

•  Leslie  Pack  Kaelbling,  “Intelligent  Robots  in  the  Real  World,”  invited  talk  at  the 
Eleventh  IFIP  World  Computer  Congress,  San  Francisco,  California,  1989. 

•  H.  Keith  Nishihara,  “A  Machine  Theory  for  Human  Stereo  Vision,”  Apple  Computer, 
August,  1989. 

•  Stanley  J.  Rosenschein,  participant  in  NASA/Ames  Applications  Workshop,  Moffett 
Field,  California,  October,  1989. 
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•  Stanley  J.  Rosenschein,  guest  speaker  at  Artificial  Intelli^ence/Robotics  Seminar 
Series,  Computer  Science  Division,  University  of  California,  Uerkeley,  California, 
November,  1989. 

•  Stanley  J.  Rosenschein  and  Leslie  Pack  K sibling,  participants  in  Workshop  on  In¬ 
telligent  Real-Time  Problem  Solving,  Santa  Crur.,  Ca'ifomia,  [November  1989. 

•  Stanley  J.  Rosenschein,  co-organizer  and  presenter  with  M.  Pollack  and  M.  Bratman, 
seminar  series,  “Models  of  Rational  Agency,”  for  Center  for  the  Study  of  Language 
and  Information,  Stanford  University,  Stanford,  California.  Fall,  1989. 

•  H.  Keith  Nishihara,  “Hidden  Information  in  TVansparent  Stereograms,”  CCRMA 
Seminar,  Stanford  University,  March,  1990. 

•  Leslie  Pack  Kaelbling,  “Planning  and  Action  in  Robotics  and  AI,”  invited  talk  at  the 
International  Symposium:  Artificial  Intelligence,  What  Reality?,  Rabat,  Morocco, 
May,  1990. 

•  David  Chapman,  Leslie  Pack  Kaelbling,  and  Stanley  J.  Rosenschein,  participants  in 
DARPA  Workshop  on  Benchmarks  and  Metrics  for  Integrated  Agent  Architectures, 
July,  1990. 

•  Stanley  J.  Rosenschein,  “Reasoning  and  Acting  in  Real  Time,”  invited  talk  at  the 
Eighth  National  Conference  on  Artificial  Intelligence,  Boston,  Massachusetts,  Au¬ 
gust,  1990. 
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6  Discoveries  and  Inventions 


There  have  been  no  new  discoveries,  inventions,  or  patent  disclosures  or  applications  stem¬ 
ming  from  this  research  effort. 
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7  Other  Statements 


The  following  papers,  which  are  included  as  appendices  to  this  report,  provide  a  detailed 
description  of  the  research  progress  achieved  under  this  contract: 

•  Stanley  J.  Rosenschein,  “Synthesizing  Information- Tracking  Automata  from  Envi¬ 
ronment  Descriptions.” 

•  Leslie  Pack  Kaelbling  and  Stanley  J.  Rosenschein,  “Action  and  Planning  in  Embed¬ 
ded  Agents.” 

•  Leslie  Pack  Kaelbling,  “Foundations  of  Learning  in  Autonomous  Agents.” 

•  Leslie  Pack  Kaelbling,  “Learning  Functions  in  fc-DNF  from  Reinforcement.” 

•  David  Chapman,  “Intermediate  Vision:  Architecture,  Implementation,  and  Use.” 

•  Leslie  Pack  Kaelbling,  Neil  D.  Hunt,  Stanley  J.  Rosenschein,  H.  Keith  Nishihara, 
Nathan  J.  Wilson,  Laura  E.  Wasylenki,  and  Jeffrey  R.  Kerr,  “Cooperative  Robot 
Demonstration:  Working  Document” 

In  addition,  Leslie  Pack  Kaelbling  completed  her  Ph.D.  thesis  under  the  partial  spon¬ 
sorship  of  this  contract.  It  is  titled  Learning  in  Embedded  Systems ,  and  has  been  included 
under  separate  cover. 
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Synthesizing  Information- Tracking  Automata  from 
Environment  Descriptions 


Synthesizing  Information-Tracking 
Automata  from  Environment 
Descriptions 

Stanley  J.  Rosenschein 

Teleos  Research 

Technical  Report  No.  2 
July  3,  1989 


Abstract 

This  paper  explores  the  synthesis  of  finite  automata  that  dynamically  track  con¬ 
ditions  in  their  environment.  We  propose  an  approach  in  which  a  description  of  the 
automaton  is  derived  automatically  from  a  high-level  declarative  specification  of  the 
automaton’s  environment  and  the  conditions  to  be  tracked.  The  output  of  the  synthe¬ 
sis  process  is  the  description  of  a  sequential  circuit  that  at  each  clock  cycle  updates 
the  automaton’s  internal  state  in  constant  time,  preserving  as  an  invariant  the  corre¬ 
spondence  between  the  state  of  the  machine  and  conditions  in  the  environment.  The 
proposed  approach  allows  much  of  the  expressive  power  of  declarative  programming 
to  be  retained  while  insuring  the  reactivity  of  the  run-time  system. 


This  work  was  supported  in  part  by  a  gift  from  the  System  Development  Foundation. 


Synthesizing  Information-Tracking 
Automata  from  Environment 
Descriptions 


1  Introduction 

This  paper  is  concerned  with  the  synthesis  of  finite  automata  whose  internal  states 
are  provably  correlated  with  changing  conditions  in  the  environment.  In  earlier 
work  [Rosenscheinl985,Rosenschein  and  Kaelblingl986],  we  investigated  the  math¬ 
ematical  foundations  of  embedded  machines  and  direct  methods  of  programming 
them.  Later  research  was  aimed  at  raising  the  conceptual  level  of  the  programming 
task  by  exploring  declarative  techniques  for  synthesizing  their  action-selection  cir¬ 
cuitry  [Kaelblingl988].  In  this  paper,  we  extend  this  line  of  research  to  perceptual 
updates,  that  is,  the  computations  responsible  for  updating  the  internal  information 
state  of  the  machine.  We  present  techniques  that  allow  programmers  to  describe 
the  environment  in  which  a  machine  is  to  be  embedded  along  with  conditions  to 
be  tracked  and  to  have  these  descriptions  algorithmically  transformed  into  provably 
real-time  circuitry  for  tracking  those  conditions  in  the  environment.  Information 
about  these  conditions  would  be  used  by  other  parts  of  the  system  to  guide  action. 

Mainstream  theoretical  AI  has  developed  models  of  information  and  action  based 
on  formalized  commonsense  psychology.  In  this  approach,  intelligent  computer  sys¬ 
tems  are  modeled  as  having  at  their  disposal  a  set  of  propositional  “beliefs,”  usually 
assumed  to  be  embodied  in  a  set  of  symbolic  expressions,  such  as  logical  formulas, 
whose  intended  semantics  are  clear  to  the  designer.  Some  of  these  beliefs  are  pro¬ 
vided  by  the  designer  as  part  of  a  knowledge  base,  while  others  are  produced  by 
the  perceptual  system  at  run  time.  In  addition,  the  system  contains  inference  pro¬ 
cedures  for  dynamically  deriving  new  beliefs  from  old  and  for  continuously  revising 
beliefs  over  time  in  response  to  sensory  inputs  (and  perhaps  reflection.)  In  this  way, 
the  designer  can  arrange  for  the  agent  to  have  access  to  a  much  more  complex  set 
of  beliefs  than  could  have  been  enumerated  explicitly  in  advance.  The  designer  also 
provides  symbolic  representations  of  the  goals  the  agent  is  to  pursue.  The  agent 
continuously  attempts  to  deduce  which  actions  it  should  take  to  achieve  its  goals 
and  then  performs  those  actions. 

By  modeling  the  information  available  to  the  system  as  symbolic  facts  deducible 
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by  the  system,  the  traditional  approach  allows  the  methods  of  symbolic  logic,  in¬ 
cluding  automated  symbolic  inference,  to  be  applied  to  problems  in  agent  design. 
Of  particular  importance  is  the  availability  of  a  clear  semantics  for  non-numerical 
data  structures  that  are  used  to  represent  qualitative  information  about  the  world. 
These  are  attractive  features — ones  we  would  like  to  preserve.  However,  the  tradi¬ 
tional  AI  approach  also  has  some  other,  less  attractive,  features  which  we  hope  to 
eliminate.  For  example,  in  applications  requiring  continuous,  high-speed  interac¬ 
tion  with  the  environment,  the  computational  cost  of  formally  deriving  facts  from  a 
data  base  of  logical  premises  and  of  keeping  the  data  base  consistent  with  the  world 
is  often  prohibitive.  This  has  been  a  severe  obstacle  to  building  high-performance 
embedded  computer  systems  based  on  the  model  of  the  intelligent  agent  as  symbolic 
reasoner. 

Situated-automata  theory  is  a  framework  for  reconciling  the  attractive  features 
of  AI  methods  (non-numerical  descriptions  of  the  world)  and  of  control-theoretic 
methods  (continuous  constant-time  updating  of  internal  representations  and  guar¬ 
anteed  response.)  The  central  observation  of  situated-automata  theory  is  this:  It  is 
not  the  run-time  symbols  or  numbers,  per  se,  that  are  of  significance,  it  is  the  fact 
that  (1)  they  are  semantically  meaningful  to  the  designer,  that  is,  they  stand  for 
well-defined  world  conditions,  and  (2)  the  machine  is  designed  in  such  a  way  that 
the  world  condition  represented  by  the  value  of  an  internal  location  will  indeed  hold 
when  that  location  has  that  value. 

In  this  paper,  we  apply  the  situated-automata  framework  to  the  problem  of  syn¬ 
thesizing  machines  that  track  semantically  complex  conditions  in  the  environment 
using  constant-time  update  circuitry.  We  describe  how  inference  techniques  can 
be  used  at  compile  time  to  carry  out  the  synthesis  automatically,  given  symbolic 
descriptions  of  the  environment  and  of  the  information  to  be  tracked. 


2  Basic  concepts:  A  model  of  information 

The  mathematical  framework  of  situated-automata  theory  takes  as  its  starting  point 
a  model  of  dynamic  systems.  Consider  a  physical  or  computational  system  consist¬ 
ing  of  a  set  of  locations  that  can  be  in  different  states  over  time.  These  states  can  be 
thought  of  as  actual  physical  states  or  as  abstract  data  values  that  might  be  stored 
in  the  register  of  a  computer.  Let  T  be  a  set  of  times,  L  a  set  of  locations,  and  let 
each  location  a  take  on  values  from  some  set,  Da,  with  compound  locations  [a,  6] 
taking  on  values  in  Da  x  D^.  Let  the  union  of  all  value  domains  be  designated  by 
D.  Each  possible  “trajectory”  of  values  can  be  given  by  a  function  w  :  L  x  T  —*  D, 
in  which  w(a,  t)  is  the  value  of  location  a  at  time  t  in  trajectory  w.  Following  the 
terminology  of  possible-worlds  semantics,  we  call  these  trajectories  “worlds.” 

In  physical  or  computational  systems  that  operate  according  to  fixed  rules  or 
constraints,  not  every  world  is  consistent  with  the  laws  of  nature.  This  can  be 
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captured  mathematically  by  identifying  some  designated  subset  of  worlds  that  axe 
consistent  with  the  intended  constraints.  We  shall  call  this  set  of  possible  worlds 
W.  Let  $,  the  set  of  propositions  or  world  conditions ,  be  the  set  of  all  subsets  of 
W  X  T.  Intuitively,  a  condition  <p  €  $  corresponds  to  the  set  of  world-time  points 
at  which  that  condition  holds.  We  sometimes  write  <p(wyt)  rather  than  (u>,t)  €  <p 
when  we  wish  to  assert  that  the  condition  <p  holds  at  w,  t. 

By  definition,  $  has  the  structure  of  a  Boolean  algebra  of  sets:  a  condition  <p 
can  imply  (be  a  subset  of)  another  condition  0,  we  can  take  the  meet,  <p  D  ip,  of 
two  conditions,  and  so  on.  Furthermore,  the  structure  of  $  allows  us  to  define  two 
mathematical  objects  useful  for  characterizing  world  dynamics:  the  initial  condition 
<po  =  {(to, 0)  |  w  €  W}  and  the  strongest  postcondition  function  5  :  $  — ►  $, 
where  S(<p)  =  {(u>,<  +  1)  |  tp(wyt)).  The  initial  condition  <po  and  the  strongest 
postcondition  function  S  will  be  used  later  to  characterize  machine  synthesis. 

The  restriction  on  what  is  possible  gives  rise  directly  to  a  notion  of  information. 
The  information  contained  in  a  location’s  value  is  modeled  as  the  strongest  propo¬ 
sition  consistent  with  that  location’s  having  that  value.  Formally  stated,  to  every 
location  (or  compound  location)  a,  we  associate  a  function,  Ma:  Da  that  maps 
a’s  values  into  propositions.  This  function  is  defined  as  follows:  Ma(v)  =  {(u>,<)  | 
w(ayt)  =  t>}.  To  say  that  a  location  a  has  the  information  that  <p  in  world  w  at 
time  t  is  to  say  Ma(v)  implies  <p,  in  other  words,  that  the  proposition  <p  is  true  at 
each  world  and  time  in  which  a  has  the  same  value  it  has  in  world  w  at  time  t. 

As  defined,  the  concept  of  information  is  very  abstract,  representing  the  total¬ 
ity  of  what  must  be  the  case,  given  that  some  location  in  the  machine  has  the 
value  it  does.  For  this  notion  to  be  of  practical  use,  we  must  find  ways  of  ex¬ 
pressing  in  understandable  terms  particular,  more  limited,  aspects  of  this  total 
information  content.  This  is  the  proper  role  of  logic.  By  defining  logical  lan¬ 
guages  whose  formulas  express  propositions  of  interest,  we  can  conveniently  de¬ 
scribe  the  content  of  propositions  included  in  an  agent's  information  state,  such  as 
in(booky  rooml )  V  in(book,  room2).  Furthermore,  modal  logics  of  knowledge  can  be 
used  to  assert  facts  about  the  information  relation  itself,  such  as  whether  particular 
locations  have  or  do  not  have  particular  information,  e.g.,  ~'K(ay  in(book ,  room2))A 
K(btin(bookyroom2))y  which  asserts  that  location  a  does  not  contain  the  informa¬ 
tion  that  the  book  is  in  room  2,  while  location  b  does.  These  logics  are  explored 
more  fully  in  [Rosenschein  and  Kaelblingl986]  and  [Halpern  and  Mosesl985].  In 
this  paper,  we  will  use  letters  p, 5, ...  and  standard  logical  operators  A,  V, . . .  in 
formulas  that  express  the  information  carried  in  a  location’s  value. 

When  we  wish  to  consider  machines  with  very  large  state  sets,  we  regard  the 
machine  as  being  constructed  from  a  network  of  components,  with  the  state  set 
of  the  whole  machine  corresponding  to  the  Cartesian  product  of  the  state  set  of 
the  individual  components.  Fortunately,  there  are  straightforward  techniques  for 
inferring  informational  properties  of  aggregates  from  informational  properties  of 
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their  components.  For  instance,  the  following  can  easily  be  shown  to  be  valid: 

M«,6]([u,t>])  =  M»(u)  n  Mb(v) 

We  refer  to  this  property  as  spatial  monotonicity.  It  follows  that  if  location  a 
carries  the  information  that  p  holds  and  location  6  carries  the  information  that  q 
holds,  then  the  aggregate  location  [a,  6]  carries  the  information  that  the  conjunctive 
condition  p  A  q  holds.  Spatial  monotonicity  is  useful  in  synthesis  because  it  means 
that  subsystems  can  be  developed  independently  and  composed  in  a  meaningful 
way. 

It  is  important  to  observe  that  location  a  can  carry  information  about  p  without 
explicitly  encoding  a  symbolic  formula  representing  p;  any  value  of  the  location  that 
is  systematically  correlated  with  p  will  suffice.  Different  locations  might  have  dif¬ 
ferent  states  representing  the  same  proposition  p,  and  the  same  data  values  might 
have  different  informational  significance  at  different  locations.  In  general,  an  infinite 
number  of  formulas  will  follow  from  the  information  contained  in  a  finite  value,  but 
since  the  formulas  need  not  be  separately  represented,  this  causes  no  problem.  In¬ 
deed,  the  computational  complexity  of  updating  a  location’s  value  so  that  it  tracks 
changing  conditions  in  the  environment  is  entirely  decoupled  from  the  computa¬ 
tional  complexity  of  the  symbolic  inference  problem  for  the  logical  language  that 
expresses  the  conditions  being  tracked.  This  fact  is  crucial  for  understanding  how 
seemingly  complex  semantic  conditions  can  be  monitored  in  real  time. 

These  considerations  lead  directly  to  certain  abstractions  useful  for  describing 
how  information  is  represented  in  machine  states  and  how  it  is  re-represented  as 
it  moves  from  location  to  location  within  a  machine.  We  call  these  abstractions 
informational  data  types. 


3  Informational  data  types 

Recall  that  the  function  Ma  maps  a’s  values  (elements  of  Da)  to  the  abstract  propo¬ 
sitions  with  which  they  are  correlated.  As  such,  we  can  think  of  it  as  a  “meaning” 
function.  It  is  also  useful  to  consider  an  inverse  to  these  meaning  functions,  namely, 
“representation”  functions  that  map  propositions  back  to  the  data  values  that  en¬ 
code  them.  Let  us  define  an  informational  data  type  to  be  a  triple  r  =  (D,M,  R), 
with  D  being  a  set  of  data  values,  M  a  meaning  function  from  D  to  $,  and  R  a 
representation  function  from  $  to  D.  Intuitively,  if  a  location  is  of  type  r,  then 
whenever  it  takes  on  the  value  v  €  D,  the  world  is  intended  to  satisfy  condition 
M(y),  and  if  the  world  satisfies  condition  <p,  it  is  appropriate  for  the  location  to  take 
on  the  value  R(<p).  These  two  functions  must  satisfy  the  property  that  <p  implies 
M(R(<p))  for  all  <p  €  $,  that  is,  the  representation  map  must  be  consistent  with 
the  meaning  map.  Since  the  implication  is  only  one  way,  the  representation  of  a 
world  condition  in  the  machine  will,  in  general,  not  be  information  preserving.  An 


4 


extreme  case  of  this  is  when  M(R(<p))  —  true  and  contains  no  information  at  all. 
We  generally  assume  R((p)  to  be  maximally  specific,  that  is  if  R(ip)  —  v,-,  then  there 
is  no  Vj  such  that  M(vj)  implies,  but  is  not  implied  by,  M(vi). 

Informational  data  types  provide  a  way  of  analyzing  the  localization  of  infor¬ 
mation  in  a  machine,  including  the  computational  complexity  of  such  localization. 
Given  two  informational  data  types,  rx  as  (Di,Mi,Ri)  and  ra  =  (Z?2, M2, #2),  we 
can  define  a  translation  function  that  “re-represents”  the  content  implicit  in  the  val¬ 
ues  of  a  location  of  type  rj  in  the  language  of  ra.  Mathematically,  the  translation 
function  is  a  mapping  Ta  :  D\  -*  Dj  that  is  defined  as  follows:  Ta( v)  —  i2j(Mi(u)), 
i.e.,  the  representation,  in  the  second  “language,"  of  the  meaning,  in  the  first  “lan¬ 
guage,"  of  v.  The  computability  and  complexity  of  these  translations  remain  to  be 
determined  and  are  greatly  affected  by  the  choice  of  representation,  that  is,  by  the 
specific  nature  of  M  and  R  and  not  merely  by  the  range  of  propositions  encoded. 

Although  the  informational  concepts  developed  thus  far  apply  equally  to  finite 
and  infinite  languages,  we  shall  henceforth  restrict  our  attention  to  machines  having 
a  finite  number  of  internal  locations,  each  taking  on  values  from  a  finite  domain. 
One  immediate  consequence  is  that  all  translation  functions  are  computable,  al¬ 
though  complexity  trade-offs  remain.  For  instance,  since  all  Boolean  functions  can 
be  computed  by  a  circuit  of  depth  2,  we  could  always  compute  the  translation  func¬ 
tion  in  constant  time — if  we  were  prepared  to  tolerate  the  potentially  large  number 
of  computing  elements  that  may  be  required.  In  the  worst  case,  an  exponential 
number  of  gates  could  be  needed  and  the  constant-time  result  is  merely  academic. 
Our  aim,  however,  is  to  control  the  synthesis  process  in  order  to  produce  systems 
that  not  only  track  world  conditions  but  are  also  practical  to  design  and  implement. 

In  the  next  section  we  discuss  how  informational  data  types  can  be  used  to 
approach  the  synthesis  problem. 


4  From  analysis  to  synthesis:  The  machine  in¬ 
duced  by  world  dynamics 

One  way  of  using  the  situated-automata  framework  is  for  the  analysis  of  existing 
machines:  Given  the  description  of  an  environment  and  of  a  machine  embedded  in 
that  environment,  we  seek  to  describe  the  information  encoded  in  its  states.  For 
purposes  of  design,  however,  we  are  more  interested  in  the  opposite  question:  Given 
a  description  of  the  world  and  of  the  information  we  would  like  to  have  encoded  in 
machine  states,  how  can  wc  design  the  machine’s  circuitry  in  such  a  way  that  states 
of  the  machine  will  actually  be  correlated,  as  desired,  with  conditions  in  the  world? 

At  the  theoretical  level,  we  can  show  that  the  dynamics  of  the  world,  together 
with  the  semantics  of  the  machine’s  inputs  and  the  intended  semantics  of  its  internal 
states  (expressed  as  an  informational  data  type),  fully  determine  a  machine  whose 
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internal  states  carry  the  desired  information  by  virtue  of  their  actual  correlation 
with  the  world.  To  see  why  this  is  the  case,  imagine  we  are  given  an  informational 
data  type  r,n  =  { D,n,M,„,.R,n )  for  the  input  location  of  a  machine  and  an  intended 
data  type  ra  =  (Da,Mm,Ra)  for  the  internal  location  a.  Imagine,  further,  that  we 
are  given  the  proposition  ipo  approximating  the  initial  world  condition  (in  the  sense 
that  ipo  is  implied  by  the  true  initial  condition),  and  a  function  S  approximating  the 
true  strongest-postcondition  function  (in  the  sense  that  S(<p)  is  implied  by  the  true 
strongest  postcondition  of  <p  for  each  <p;  approximation  is  the  best  we  can  do,  since 
the  world  is  not  fully  determined  until  we  have  fixed  the  embedded  automaton.) 
We  now  show  how  these  elements  determine  a  machine  that  tracks  changing  world 
conditions  as  desired. 

Here  a  machine  will  be  defined  by  a  pair  of  domains  D{n  and  Da  (for  inputs  and 
internal  states),  an  initial  value  t \>  €  D«,  and  a  next-state  function  /  :  D,„  x  D0  — » 
Da  satisfying  the  following  conditions  for  all  u>,  t: 

to(a,0)  =  vo 

w(a,t  +  l)  =  /(to(m,<),u>(a,<)) 

Given  r,„,  r0,  <po,  and  S,  we  define  v0  and  /  as  follows: 

Vq  =  ^((po) 

/(«,»)  =  iHo(5(M,n(tl)  n  Ma(v))) 

Intuitively,  u0,  the  initial  value  of  the  location  a,  is  just  the  representation,  in 
a’s  data  type,  of  the  initial  [  ^position;  the  value  of  the  next-state  function  is 
determined  mathematically  by  considering  the  proposition  associated  with  the  old 
value  of  a  and  with  the  input,  determining  what  will  be  true  one  time  instant  in  the 
future  given  what  is  true  now,  and  representing  that  proposition  in  the  data  type 
of  a.  Note  the  implicit  reliance  on  spatial  monotonicity  and  the  similarity  between 
this  construction  and  the  definition  of  translation  functions  in  the  previous  section. 

Assuming  M,n  is  veridical,  it  can  be  shown  that  these  definitions  of  vq  and  / 
insure  that  Ma  will  be  veridical  as  well,  i.e.,  that  the  machine’s  states  will  indeed 
be  correlated  with  the  intended  meanings  of  those  states.  Mathematically,  we  must 
demonstrate  that  for  all  tu,  t: 

Ma(w(a,t))(w,t). 

The  proof  of  this  proposition  is  straightforward  and  proceeds  by  induction  on  t. 
(The  variable  w  is  universally  quantified  throughout.)  The  base  case  is  established 
as  follows:  From  the  definition  of  <po,  we  have  that  ^o(ty,0).  The  soundness  of 
gives  us  Ma(Ih(<p o))(w,0),  whence  Affl(u;(a,0))(u>,0)  follows  immediately  from  the 
definition  of  t>o  by  simple  substitution,  since  w(a,  0)  =  v. 

The  inductive  case  is  established  similarly.  The  induction  hypothesis  is  that 

Ma(w(a,t))(w,t) , 
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and  the  veridicality  of  Af.,  gives  us 


Min(w(in,t))(w,t) . 

Conjoining  these  conditions  yields 

(Mm(u)(tn,  t))  D  Ma(u>(a,  <)))(u>,  t) . 

The  definition  of  5  implies  that 

S(AT,„(tt>(tn,<))  n  Ma(tv(a,t)))(w,t  + 1), 
and  the  soundness  of  J2,  guarantees  that 

Ma(Ra(S(Min(w(int  t))  fl  Mt(w(a,t)))))(w,t  +  1). 
Substituting  in  the  definition  of  /,  we  get 

A/a(/(u>(m,  t),  w(a ,  <)))(“>,  <  + 1). 
from  which  Ma(w(a,t  +  l))(u>,f  + 1)  follows  immediately. 


5  Syntactifying  the  construction 

We  have  just  seen  how  the  dynamics  of  the  environment,  together  with  the  se¬ 
mantics  of  the  inputs  and  the  intended  semantics  of  the  internal  state,  completely 
determine  the  structure  of  a  machine.  To  be  of  practical  utility,  however,  the  math¬ 
ematical  construction  must  be  made  operational.  One  approach  would  be  for  the 
programmer,  based  on  his  intuitive  understanding  of  the  task  environment,  to  define 
the  induced  automaton  directly  in  a  conventional  programming  language.  Although 
adequate  in  principle,  this  approach  is  difficult  to  apply  in  practice  for  complex  do¬ 
mains.  For  this  reason  we  seek  compilation  techniques  that  would  automate  at  least 
part  of  the  synthesis  process  and  make  the  transition  from  environment  description 
to  automaton  more  transparent  to  the  programmer. 

Although  the  automaton  is  mathematically  determined  by  r,n,  ra,  <po,  and  S,  we 
cannot  directly  present  these  abstract  objects  to  a  compiler  and  must  use  symbolic, 
often  approximate,  descriptions.  Let  us  examine  the  form  these  descriptions  might 
take  for  informational  data  types  (r,„  and  ra)  and  for  world  dynamics  (<po  and  5). 


5.1  Specifying  informational  data  types 

Consider  a  machine  location  x  of  informational  data  type  rx  =  {Dx,Mx,Rx).  Let 
us  see  how  the  three  components  of  the  data  type  might  be  described  to  a  compiler. 
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The  value  domain  Dx  is  straightforward  to  describe  using  conventional  data  type 
declarations.  For  our  purposes,  it  will  be  sufficient  to  consider  only  atomic  data 
types  such  as  Booleans,  integers,  floats,  etc.  and  record  structures,  possibly  nested 
over  these  atomic  types.  For  example,  to  teU  the  compiler  about  the  value  domain 
of  location*  we  might  write  x:  [bool  [int  int]  float]. 

of  the  other  two  components  is  more  complex.  Let  us  begin 
with  Mx.  fiecaU  that  Mr,  the  “meaning  function”  associated  with  location  *,  maps 
dements  of  Dx  to  *,  the  set  of  propositions,  where  propositions  are  modeled  as  sets 

worid-time  pairs.  Recall  also  that  logical  formulas  can  be  used  to  partially  express 

r  °fa  pr°P^oa>  Prided  they  are  taken  from  a  logic  that  assigns  sets  of 
world-time  paira  as  the  denotation  of  formulas.  Many  temporal  logics  will  suffice  for 
this  purpose.  (For  one  example,  see  (Rosenschein  and  Kaelblingl986].)  Given  such 
*  lope,  formulas  parametrized  by  run-time  values  can  be  used  to  define  a  meaning 
function  from  values  to  propositions.  ® 

^  thC  1?5ua«e1of  SOme  temP°ral  with  interpretation 
function  X .  C  -*  $  and  provabihty  relation  h  Assume  that  among  its  individual 

terms  he  language  has  constants  that  rigidly  designate  values  of  locations,  possibly 
m  addition  to  terms  that  denote  location  values  that  vary  with  world  and  time.  If 
v  is  a  data  value,  we  let  c,  stand  for  the  rigid  designator  of  value  t,  in  the  language 

Let  Px(U)  be  a  formula  of  C  with  a  free  variable  U  for  which  value-denoting 
terms  can  be  subs  atuted  Each  substitution  instance  F,(c*)  is  a  closed  formula  to 
which  the  interpretation  function  X  can  be  applied.  The  parametric  formula  PJU) 

folLT  8  a  maPPlng  *  :  Dt  *  that  &PProxiraates  is  defined  as 

K(v)=I(Px(cv)). 

The  semantic  interpretation  1  of  the  language  C  is  itself  approximated  for  the 
compiler  by  a  set,  T,  of  background  facts  relative  to  which  the  syntactic  conse- 
quences  of  P*(cw)  are  to  be  derived.  To  answer  the  question  “does  Mx(u )  imply 

tit  techlquTP  r  attCmPt  t0  eStaWiSh  r  h  Px(Cu)  Pv(Cv)  ^inS  deduc- 

Having  approximated  Mx  by  a  formula  PX(U)  and  a  background  theory  V  we 
have  nearly  determined  the  third  component  of  the  informational  data  type  as 
well  Recall  that  the  representation  function  R*  is  intended  to  map  propositions 
to  elements  of  the  value  domain  that  best  capture  them.  Since  we  are  encoding 
propositions  for  the  compiler  using  formulas,  we  are  interested  in  functions  that  map 
formulas  to  data  values.  If  Q  is  a  formula  expressing  the  proposition  y>,  candidate 
representations  of  (relative  to  T)  should  be  drawn  from  the  set 

C'*(Q)  =  {w|n-Q->Px(cu)}. 

Intuitively,  elements  of  CX(Q)  are  the  data  values  whose  meaning  is  entailed  by 
the  information  ,n  Q,  and  hence  by  v.  This  set  must  contain  at  least  one  element, 
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since  there  must  be  a  value  in  Dx  representing  true,  which  is  entailed  by  every 
proposition.  In  practice,  it  is  convenient  to  have  multiple  representations  of  true, 
for  instance  by  uniformly  including  a  boolean  valid  bit  in  all  data  types.  When  the 
valid  bit  has  the  value  zero,  the  meaning  of  the  whole  value  is  taken  to  be  simply 
true,  regardless  of  the  values  of  the  rest  of  the  parameters. 

If  there  is  more  than  one  element  in  CX(Q)}  the  multiple  values  must  somehow 
be  combined  so  that  they  might  “fit”  into  the  space  alloted  to  i.  We  call  the 
functions  responsible  for  combining  these  values  rconj  functions  (“representation  of 
the  conjunction.”)  This  function  is  defined  as 

rconjx(vuv2)  -  n  Mx(v2)). 

Because  the  choice  of  /?*,  and  hence  rconj ,  is  under-  determined  by  the  meaning 
function  Mr,  the  designer  must  somewhow  stipulate  the  rconjx  function  directly  for 
each  type  rr.  This  can  be  made  relatively  convenient  through  the  use  of  declarative 
rules  of  the  form 

P*(Vt)APx(V2)^Px(f(VuV2)). 

Having  specified  a  binary  rconj  operator,  arbitrary  finite  sets  of  values  can  be 
combined  in  the  obvious  way: 

rconjmx(vu...,vn)  = 

rconjx(v  i , . . .  rconj t(  t>„-i ,  vn ) . . .) 

The  order  of  combination  does  not  matter  since  the  rconj  function  is  commutative 
due  to  the  commutativity  of  the  underlying  conjunction  operator  fl  in  terms  of 
which  rconj  is  defined. 

Example 

We  illustrate  the  concepts  above  by  defining  a  sample  informational  data  type. 
Consider  a  location  x  of  value  type  [bool  int] .  Informally,  the  first  field  is  the 
valid  bit,  and  the  second  field  is  intended  to  represent  a  lower  bound  on  the  age  of 
some  individual,  FYed. 

The  semantics  of  x  can  be  expressed  using  the  formula 

PX(U)  =  age(f  red,  [first  (U),  second(U )]) 
under  the  intended  interpretation: 

where  <p{\ 2)  =  {{w,t)  |  age(fred,w,t )  >  the  number  denoted  byV2).  Thus,  the  run¬ 
time  value  (0,  n]  at  location  x  would  represent  the  vacuous  proposition  true  for  any 
n,  the  value  (1, 14]  would  represent  that  Fred  is  at  least  14  years  old,  and  so  on. 
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An  rconj  rule  for  the  informational  data  type  might  look  like  this: 
agtUrtd,  (£7,  ,£/,])  A  age{frtd,  (Vi ,  V2)) 

— » age(fred , 

[or(U„  V,),iJ(Uuif(Vumax(U2,  V2),  U2),  V,)]). 

This  rule  indeed  defines  a  con  lutative  rconj  operator  that  can  be  used  to  com¬ 
bine  values  of  this  informational  data  type  in  a  way  consistent  with  the  intended 
interpretation. 

Furthermore,  if  the  intended  model  incorporates  the  constraint  that  18-year  olds 
can  vote,  we  might  include  among  the  background  facts  an  assertion  of  the  form 

agc(fred ,  [UUV2])  -» 

can-vote(fred)  and(JJ\ ,  gc(U2l  18))). 

This  fact  implicitly  defines  part  of  a  translation  function  from  the  nge(fred, — )  data 
type  to  the  can-vote(fred,—)  data  type.  Notice  that  in  this  data  type,  frcd  is 
fixed  at  compile  time.  This  would  be  appropriate  if  distinct  locations  were  used  to 
store  information  about  individuals  referred  to  explicitly  at  compile  time.  If  FVed’s 
identity  were  not  known  until  run  time,  the  run-time  parameter  would  have  to  take 
on  values  that  encoded  propositions  about  FVed,  Sam,  etc. 

5.2  Specifying  World  Dynamics 

As  we  described  above,  the  compiler  is  assumed  to  have  access  to  a  background 
theory,  that  is,  a  set  of  assertions  describing  the  environment.  This  theory  will 
contain  many  temporal  facts  as  well  as  atemporal  facts.  By  choosing  the  language 
C  to  include  appropriate  temporal  operators  we  can  express  facts  about  the  initial 
condition  of  the  world  and  about  temporal  transitions  in  a  way  that  allows  us  to  ap¬ 
proximate  the  semantic  objects  <po  (initial  condition)  and  S  (strongest  postcondition 
function.) 

In  the  simplest  case, the  language  C  need  only  include  the  modal  operators  □, 
init,  and  next  satisfying  the  following  semantic  properties  for  all  to,  t: 

w,t\=n<p  iff  to',  t'  f=  <p  for  all  to',  t' 

to,  t  f=  init  <p  iff  to,  0  |=  <p 
to,  t  f=-  next  ip  iff  to,  t  +  1  f=  <p 

The  compiler  can  answer  questions  of  the  form  “does  S{Mx{u))  imply  Mv(v)” 
by  establishing  F  b  □(f*(cu)  — »  next  Pv(cv))  using  deductive  techniques. 
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5.3  Putting  the  Pieces  Together:  Synthesis  Method,  ao- 
stract  Version 

We  now  describe  a  compilation  method  that  operates  on  the  representations  dis¬ 
cussed  above  and  produces  a  circuit  description  of  the  oesired  autom  '-ton.  The 
compiler  takes  as  inputs  a  descriptic  1  of  information  carried  by  the  run-time  in¬ 
puts  to  the  machine  and  the  internal  machine  state,  as  well  as  a  background  theory 
containing  temporal  facts.  The  compiler  operates  by  deriving  theorems  about  what 
is  true  initially  and  about  what  will  be  true  next  at  any  time,  given  what  is  true 
at  that  time.  In  the  course  of  the  derivation,  free  variables  are  instantiated  in  the 
manner  of  logic  programming  systems.  FYom  the  instantiated  formulas  the  com¬ 
piler  extracts  the  initial  value  of  the  machine’s  internal  state  and  the  description  of 
a  circuit  for  updating  the  machine’s  state  vector. 

More  precisely,  the  compiler’s  inputs  consist  of  the  following: 

•  a  list  [ai,  ...,On]  of  input  locations 

•  a  list  [&i,  ...,6m]  of  internal  locations 

•  for  each  input  location  a,  a  formula  Pa(U )  with  free  variable  U 

•  for  each  internal  location  6,  a  formula  Pb(U)  with  free  variable  U  and  a  function 
rconjb 

•  a  finite  set  T  of  facts. 

For  each  internal  location  6,  the  compiler  computes  two  sets  of  value  terms  ij 
and  Nb  defined  as  follows: 


4  =  {e  |  T  b  Dinit  JT^(e)} 

Nb  =  {e\T\-  DPai(ai )  A  ...  A  Pb„(bn)  next  Pfc(e)}. 

If  these  sets  are  infinite,  they  can  be  generated  and  used  incrementally.  This  is 
discussed  more  fully  below. 

FYom  these  collections  of  sets  the  compiler  computes  the  initial  value  and  the 
update  function.  The  initial  value  is  computed  as  follows: 


v0  =  [rconjmbl  ( Ibl ),...,  rc<mjbn(Ibn )], 


In  other  words,  the  initial  value  of  the  state  vector  is  the  the  vf -tor  of  values  derived 
by  rconj- ing  values  representing  the  strongest  propositions  that  can  be  inferred  by 
the  compiler  about  the  initial  state  of  the  environment  in  the  “language”  of  each  of 
the  state  components.  Similarly,  for  the  next-state  function: 

/([®lj  •  •  •  i  [^1  >  •  •  •  >  ^m])  = 
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[rconjl  (Nbl rconjlJNbn )], 

Here  the  compiler  constructs  a  vector  of  expressions  that  denote  the  strongest  propo¬ 
sitions  about  what  will  be  true  next,  again  in  the  language  of  the  state  components. 

In  the  case  of  the  initial  value,  since  all  the  terms  are  rigid,  the  rconj  values 
can  be  computed  at  compile  time.  In  the  case  of  the  next-state  function,  however, 
the  rconj  terms  will  not  denote  values  known  at  compile  time.  Rather,  they  will 
generally  be  nested  expressions  containing  operators  that  will  be  used  to  compute 
values  at  run  time.  Assuming  the  execution  time  of  these  operators  is  bounded, 
the  depth  of  the  expressions  will  provide  a  bound  on  the  update  time  of  the  state 
vector. 

Without  restricting  the  background  theory,  we  cannot  guarantee  that  the  sets 
it  and  Nb  will  be  finite.  However,  even  in  the  unrestricted  case  the  finiteness  of 
terms  in  the  language  guarantees  that  whichever  elements  we  can  derive  at  compile 
time  can  be  computed  in  bounded  time  at  run  time.  Furthermore,  the  synthesis 
procedure  exhibits  strongly  monotonic  behavior:  the  more  elements  of  Ib  and  Nb 
we  compute,  the  more  information  we  can  ascribe  to  run-time  locations  regarding 
the  environment.  This  allows  incremental  improvements  to  be  achieved  simply  by 
running  the  compiler  longer;  stopping  the  procedure  at  any  stage  will  still  yield  a 
correct  automaton,  although  not  necessarily  the  automaton  attuned  to  the  most 
specific  information  available.  Since,  in  general,  additional  rconj  operations  con¬ 
sume  run-time  resources,  one  reasonable  approach  would  be  to  have  the  compiler 
keep  track  of  run-time  resources  consumed  and  halt  when  some  resource  limit  is 
reached. 

As  we  have  observed,  without  placing  restrictions  on  the  symbolic  language 
used  to  specify  the  background  theory,  the  synthesis  method  described  above  would 
hardly  be  practical;  it  is  obvious  that  environment-description  languages  exist  that 
make  the  synthesis  problem  not  only  intractable  but  undecidable.  However,  as 
with  Gapps  [Kaelblingl988]  and  other  formalisms  in  the  logic  programming  style, 
by  restricting  ourselves  to  certain  stylized  languages,  practical  synthesis  techniques 
can  be  developed. 

We  have  experimented  with  a  restriction  of  the  logical  language  that  seems  to 
offer  a  good  compromise  between  expressiveness  and  tractability.  This  restriction  is 
to  a  weak  temporal  Horn-clause  language  resembling  Prolog  but  with  the  addition 
of  init  and  next  operators.  1 1  this  language  the  background  theory  is  given  as  facts 
of  the  following  form: 

init  q(X,Y). 

next  q(X,f(X,Y))  :-  ql(X,Y) . qntX,Y). 

q(X,f(X,Y»  :-  ql(X,Y) , . . .  ,qk(X,Y)  . 

For  each  predicate  or  function  expression,  the  first  argument  represents  a  compile¬ 
time  term  and  the  second  a  run-time  term.  Facts  of  the  first  two  sorts  assert 
temporal  facts  (the  □  operator  is  implicit),  and  facts  of  the  last  sort  are  ordinary 
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instantaneous  facts,  much  as  one  would  find  in  a  conventional  Prolog  system,  but 
with  terms  syntactically  marked  as  compile-time  or  run-time. 

The  rconj  rules  are  given  in  the  following  form: 

rconj  q(x,f (x,yl,y2))  q(x,yl),  q(x,y2). 

The  derivation  process  proceeds  as  described  above  but  uses  backward-chaining 
deduction  techniques  adapted  from  logic  programming.  Each  distinct  location  has 
an  associated  atomic  formula  schema  p(i,Y).  In  deriving  the  initial  value  v0  the 
compiler  attempts  to  prove  p(i,Y)  from  the  init  declarations  and  the  instanta¬ 
neous  facts.  If  this  succeeds,  the  initial  value  of  the  i’th  component  of  the  state 
vector  is  the  rconj *  of  the  bindings  of  Y.  If  the  attempt  fails,  the  valid  bit  of  that 
component  is  set  initially  to  0.  Similarly,  the  next-state  function  for  component  i  is 
derived  by  attempting  to  prove  next  p(i,Y)  using  the  next  rules  and  the  instanta¬ 
neous  facts,  chaining  backwards  and  cutting  off  proofs  that  traverse  more  than  one 
“next”  clause.  Although  the  process  of  finding  the  proofs  need  not  be  real-time, 
the  circuit  that  is  finally  produced  is. 

A  prototype  system,  called  RULER,  has  been  built  implementing  the  Horn- 
clause  version  of  the  synthesis  algorithm.  The  language  resembles  Prolog  in  many 
ways,  differing  mainly  in  the  strong  distinction  between  compile-time  and  run-time 
expressions.  Compile-time  expressions  undergo  unification  in  the  ordinary  manner; 
run-time  expressions,  by  contrast,  are  simply  accumulated  and  used  to  generate 
the  circuit  description.  RULER  was  implemented  in  Lisp  as  an  extension  of  the 
Rex  language  [Kaelblingl987].  Run-time  expressions  in  RULER  are  allowed  to  be 
any  valid  Rex  expression,  and  all  of  the  Rex  optimizations  (common-subexpression 
elimination,  constant  folding,  etc.)  are  applied  to  the  resulting  circuit  desciptions 
produced  by  RULER.  The  RULER  system  was  run  on  several  small  examples  in¬ 
volving  object  tracking  and  aggregation,  and  the  synthesis  procedure  has  proved 
tractable  in  our  test  implementation. 


6  Future  Directions 

Our  current  research  is  directed  toward  extending  the  theoretical  basis  for  synthesis 
and  improving  the  practical  utility  of  tools  such  as  RULER.  On  the  theoretical  side, 
one  important  extension  is  to  adapt  the  synthesis  techniques  to  cases  where  the  cor¬ 
relation  between  machine  states  and  world  conditions  is  best  described  probabilis¬ 
tically.  Naive  approaches  will  not  work,  primarily  because  the  spatial-monotonicity 
property  fails  in  the  probabilistic  case.  For  this  reason  we  have  been  exploring 
design  disciplines  that  reconcile  structured  synthesis  methods  with  the  inherently 
non-monotonic  nature  of  probabilities,  preserving  the  spirit  of  the  techniques  pre¬ 
sented  in  this  paper. 

On  the  practical  side,  experiments  with  RULER  suggest  needed  improvements  in 
several  areas.  One  syntactic  improvement  would  be  to  uniformly  suppress  valid  bits, 
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since  their  treatment  is  so  systematic.  Freer  syntactic  intermingling  of  compile-time 
and  run-time  expressions  and  tests  would  be  useful  as  well.  A  more  serious  practiced 
consideration  has  to  do  with  helping  the  programmer  control  the  combinatorics 
of  the  synthesis  process.  As  in  general  logic  programming,  it  is  possible,  using 
RULER,  to  write  programs  with  unacceptable  combinatorial  behavior.  While  this 
is  not  the  fault  of  RULER  per  se  and  can  undoubtedly  be  ameliorated  by  increasing 
the  programmer’s  experience  and  skill  in  using  the  tool,  there  are  improvements 
that  can  be  made  in  the  system  itself,  including  facilities  for  detecting  cycles  and 
redundant  proofs.  Finally,  there  is  need  to  gain  practical  experience  in  applying 
this  style  of  programming  to  real  problems  in  visual  perception,  sensor  fusion,  and 
other  similar  areas. 


7  Related  Work 

There  has  been  considerable  work  on  the  synthesis  of  digital  machines  from  temporal 
logic  specifications,  for  example,  the  work  by  Ben  Moszkowski  [Moszkowskil983]. 
This  work  considers  symbolic  specifications  similar  to  the  kind  considered  here 
but  does  not  connect  them  directly  to  an  informational  account  of  machine  states. 
The  work  of  Joseph  Halpern  and  his  asssociates  [Halpem  and  Mosesl985],  on  the 
other  hand,  has  examined  mathematical  approaches  closely  related  to  our  own  for 
characterizing  information  in  distributed  system,  but  have  so  far  not  addressed 
issues  of  automated  synthesis.  Chris  Goad  [Goadl986]  has  used  partial  evaluation 
to  generate  efficient  algorithms  for  visual  recognition.  Goad’s  techniques  are  rather 
domain-specific  and  do  not  handle  tracking  of  conditions  over  time.  There  is  a  rich 
literature  in  the  traditional  AI  paradigm  (as  well  as  in  formal  philosophy)  on  belief 
revision  (see,  for  example  [Doylel979,de  Kleerl986]),  but  little  work  has  addressed 
the  implications  of  real-time  update  requirements. 
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1  The  Design  of  Embedded  Agents 

Embedded  agents  are  computer  systems  that  sense  and  act  on  their  environ¬ 
ments  monitoring  complex  dynamic  conditions  and  affecting  the  environment 
in  goal-directed  ways.  Systems  of  this  kind  are  extremely  difficult  to  design  and 
build,  and  without  clear  conceptual  models  and  powerful  programming  tools, 
the  complexities  of  the  real  world  can  quickly  become  overwhelming.  In  certain 
special  cases,  designs  can  be  based  on  well-understood  mathematical  paradigms 
such  as  classical  control  theory.  More  typically,  however,  tractable  models  of 
this  type  are  not  available  and  alternative  approaches  must  be  used.  One  such 
alternative  is  the  situated-automata  framework,  which  models  the  relationship 
between  embedded  control  systems  and  the  external  world  in  qualitative  terms 
and  provides  a  family  of  programming  abstractions  to  aid  the  designer.  This  pa¬ 
per  briefly  reviews  the  situated-automata  approach  and  then  explores  in  greater 
detail  one  aspect  of  the  approach,  namely  the  design  of  the  action-generating 
component  of  embedded  agents. 

1.1  The  Situated-Automata  Model 

The  theoretical  foundations  of  the  situated-automata  approach  are  based  on 
modeling  the  world  as  a  pair  of  interacting  automata,  one  corresponding  to  the 
physical  environment  and  the  other  to  the  embedded  agent.  Each  has  local 

•This  work  wu  supported  in  part  by  the  Air  Force  Oflice  of  Scientific  Research  under 
contract  F49620-89-C-0055DEF  and  in  part  by  the  National  Aeronautics  and  Space  Adminis¬ 
tration  under  Cooperative  Agreement  NCC-2-494  through!  Stanford  University  subcontract 
PR-6359. 
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state  that  varies  as  a  function  of  signals  projected  from  the  other.  The  aim  of 
the  design  process  is  to  synthesize  an  agent,  in  the  form  of  an  embedded  state 
machine,  that  causes  the  desired  effects  in  the  environment  over  time. 

In  applications  of  interest,  it  is  often  useful  to  describe  the  agent  in  terms 
of  the  information  available  about  the  environment  and  the  goals  the  agent  is 
pursuing.  It  is  also  desirable  that  these  descriptions  be  expressed  in  language 
that  refers  to  states  of  the  environment  rather  than  to  specific  internal  data 
structures,  at  least  during  the  early  phases  of  design.  Moreover,  the  inputs, 
outputs,  and  internal  states  of  the  state  machine  will  be  far  too  numerous  to 
consider  explicitly,  which  means  the  machine  must  be  constructed  out  of  a  set  of 
separate  components  acting  together  to  generate  complex  patterns  of  behavior. 
These  requirements  highlight  the  need  for  compositional,  high-level  languages 
that  compactly  describe  machine  components  in  semantically  meaningful  terms. 

Situated-automata  theory  provides  a  principled  way  of  interpreting  data  val¬ 
ues  in  the  agent  as  encoding  facts  about  the  world  expressed  in  some  language 
whose  semantics  is  clear  to  the  designer.  Interpretations  of  this  sort  would  be 
of  little  use  were  it  not  also  the  case  that  whenever  the  data  structure  had  a 
particular  value,  the  condition  denoted  was  guaranteed  to  hold  in  the  environ¬ 
ment.  Such  considerations  motivate  defining  the  semantics  of  data  structures 
in  terms  of  objective  correlations  with  external  reality.  In  this  approach,  a  ma¬ 
chine  variable  x  is  said  to  carry  the  information  that  p  in  world  state  s,  written 
s  (s  I<(x,p),  if  for  all  world  states  in  which  x  has  the  same  value  it  does  in  s, 
the  proposition  p  is  true.  The  formal  properties  of  this  model  and  its  usefulness 
for  programming  embedded  systems  have  been  described  elsewhere  [9,11,5,10]. 

Having  established  a  theoretical  basis  for  viewing  a  given  signal  or  state  in 
the  agent  as  carrying  information  content  by  virtue  of  its  objective  correlation 
the  environment,  one  can  consider  languages  in  which  this  content  might  be 
expressed.  In  general  there  will  be  no  single  “best”  language  for  expressing  this 
information.  For  example,  one  language  is  the  set  of  signals  or  states  themselves. 
These  can  be  regarded  as  a  system  of  signs  whose  semantic  interpretations  are 
exactly  the  conditions  with  which  they  are  correlated.  However,  the  designer 
will  typically  wish  to  employ  other,  higher-level,  languages  during  the  design 
process.  This  theme  will  be  expanded  upon  below  in  connection  with  goal- 
description  languages. 

1.2  Perception-Action  Split 

One  way  of  structuring  the  design  process  for  the  cognitive  ease  of  the  designer 
is  to  separate  the  problem  of  acquiring  information  about  the  world  from  the 
problem  of  acting  appropriately  relative  to  that  information.  The  former  we 
shall  label  perception  and  the  latter,  action.  In  terms  of  the  state-machine 
model,  as  shown  in  Figure  1,  the  perception  component  corresponds  to  the 
update  function  and  the  initial  state,  whereas  the  action  component  corresponds 
to  the  output  mapping. 
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Figure  1:  Division  between  perception  and  action  components. 

The  perception-action  split  in  itself  is  entirely  conceptual  and  may  or  may 
not  be  the  basis  for  modularizing  the  actual  system.  Horizontal  decompositions 
that  cut  across  perception  and  action  iiave  been  advocated  by  Brooks  as  a 
practical  way  of  approaching  agent  design  [2].  The  horizontal  approach  allows 
the  designer  to  consider  simultaneously  those  limited  aspects  of  perception  and 
action  needed  to  support  specific  behaviors.  In  this  way,  it  discourages  the 
pursuit  of  spurious  generality  that  often  inhibits  practical  progress  in  robotics. 

These  attractive  features  are  counterbalanced,  however,  by  the  degree  to 
which  horizontal  decomposition  encourages  linear  thinking.  In  practice,  the 
methodology  of  not  separating  the  acquisition  of  information  from  its  use  tends 
to  encourage  the  development  of  very  specific  behaviors  rather  than  the  iden¬ 
tification  of  elements  that  can  recombine  freely  to  produce  complex  patterns 
of  behavior.  The  alternative  is  a  vertical  strategy  based  on  having  separate 
system  modules  that  recover  broadly  useful  information  from  multiple  sources 
and  others  that  exploit  it  for  multiple  purposes.  The  inherent  combinatorics 
of  information  extraction  and  behavior  generation  make  the  vertical  approach 
attractive  as  a  way  of  making  efficient  use  of  a  programmer’s  effort. 

The  commitment  to  a  decomposition  based  upon  the  perception-action  split 
still  leaves  open  the  question  of  development  strategy.  One  approach  is  to 
iteratively  refine  the  perception-action  pair,  more  or  less  in  lockstep.  The  in¬ 
formation  objectively  carried  by  an  input  signal  or  an  interna]  state  is  relative 
to  constraints  on  other  parts  of  the  system — including  constraints  on  the  action 
component.  The  more  constrained  the  rest  of  the  system,  the  more  the  designer 
can  deduce  about  the  world  from  a  given  internal  signal  or  state,  hence  the  more 
“information”  it  contains.  As  the  designer  refines  his  design,  his  model  of  the 
information  available  to  the  system  and  what  the  system  will  do  in  response 
becomes  increasingly  specific. 

An  alternative  to  iterative  refinement,  suitable  in  many  practical  design 
situations,  is  the  strict  divide-and-conquer  strategy  in  which  the  design  of  the 
perception  component  is  carried  out  in  complete  isolation  from  the  development 
of  the  action  component  except  for  the  specification  of  a  common  interface — the 
data  structures  that  encode  the  information  shared  between  the  perception  and 
action  modules.  Although  there  may  be  occasions  when  the  designer  needs  to 
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rely  on  some  fact  about  what  the  agent  will  do  in  order  to  guarantee  that  a 
certain  signal  or  state  has  the  semantic  content  he  intends,  if  these  situations 
can  be  minimized  or  ignored,  considerable  simplification  will  result. 

1.3  Goals 

As  we  have  seen,  one  way  of  semantically  characterizing  an  agent’s  states  is 
in  terms  of  the  information  they  embody.  The  perception  component  delivers 
information,  and  the  action  component  maps  this  information  to  action.  In 
many  cases,  however,  it  is  more  natural  to  describe  actions  as  functions  not 
only  of  information  but  of  the  goals  the  agent  is  pursuing  at  the  moment  [12]. 

Goals  can  be  divided  into  two  broad  classes:  static  and  dynamic.  A  static 
goal  is  a  statement  the  agent’s  behavior  is  simply  designed  to  make  true.  In 
reality,  a  static  goal  is  nothing  more  than  a  specification,  and  as  such  the  attri¬ 
bution  of  this  “goal”  to  the  agent  is  somewhat  superfluous,  although  it  may  be 
of  pragmatic  use  in  helping  the  designer  organize  his  conception  of  the  agent’s 
action  strategy.  Dynamic  goals  are  another  matter.  The  ability  to  attribute  to 
the  agent  goals  that  change  dynamically  at  run  time  opens  the  possibility  of 
dramatically  simplifying  the  designer’s  description  of  the  agent’s  behavior. 

Since  we  are  committed  to  an  information-based  semantics  for  reactive  sys¬ 
tems,  we  seek  an  “objective”  semantics  of  goals  defined  explicitly  in  informa¬ 
tional  terms.  We  can  reformulate  the  notion  of  having  a  goal  p  as  having  the 
information  that  p  implies  a  fixed  top-level  goal,  called  N  for  “Nirvana.”  For¬ 
mally,  we  define  a  goal  operator  G  as  follows: 

G(x,p)  =  K{x,p  —  N) . 

In  this  model,  x  has  the  goal  p  if  x  carries  the  information  that  p  implies 
Nirvana.1  This  definition  captures  the  notion  of  dynamic  goals  because  p  can 
be  an  indexical  statement,  such  as  “it  is  raining  now,”  whose  truth  varies  with 
time.  Since  this  model  defines  goals  explicitly  in  terms  of  information,  the  same 
formal  tools  used  to  study  information  can  be  applied  to  goals  as  well.  In  fact, 
under  this  definition,  goals  and  information  are  dual  concepts. 

To  see  the  duality  of  goals  and  information,  consider  a  function  /  mapping 
values  of  one  variable,  a,  to  values  of  another  variable,  b.  Under  the  information 
interpretation,  such  a  function  takes  elements  having  more  specific  information 
into  elements  having  less  specific  information.  This  is  because  functions  gener¬ 
ally  introduce  ambiguity  by  mapping  distinct  inputs  to  the  same  output.  For 
example,  if  value  uj  at  a  is  correlated  with  proposition  p  and  value  U2  at  a  is 
correlated  with  q  and  if  /  maps  both  u\  and  u?  to  v  at  6,  the  value  v  is  ambigu¬ 
ous  as  to  whether  it  arose  from  tij  or  W2,  and  hence  the  information  it  contains 

1  We  observe  that  urirler  this  definition  Folfr  will  always  be  a  goal:  in  practice,  however, 
we  are  only  interested  in  non-trivial  goals. 
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is  the  disjunctive  information  p  V  q,  which  is  less  specific  than  the  information 
contained  in  either  U]  or  tij.  Thus,  functional  mappings  arc  a  form  of  forgetting. 

Under  the  goal  interpretation,  this  picture  is  reversed.  The  analog  to  “for¬ 
getting”  is  committing  to  subgoals,  which  can  be  thought  of  as  “forgetting” 
that  there  are  other  ways  of  achieving  the  condition.  For  instance,  let  the  ob¬ 
jective  information  at  variable  a  be  that  the  agent  is  hungry  and  that  there  is 
a  sandwich  in  the  right  drawer  and  an  apple  in  the  left.  If  the  application  of 
a  many-toonc  function  results  in  variable  6's  having  a  value  compatible  with 
the  agent’s  being  hungry  and  there  being  a  sandwich  in  the  right  drawer  and 
either  an  apple  in  the  left  drawer  or  not,  we  could  describe  this  state  of  affairs 
by  saying  that  variable  b  has  lost  the  information  that  opening  the  left  drawer 
would  be  a  way  of  finding  food.  Alternatively,  we  could  say  that  variable  b 
had  committed  to  the  subgoal  of  opening  the  right  drawer.  The  phenomena  of 
forgetting  and  commitment  are  two  sides  of  the  same  coin. 

We  can  relate  this  observation  to  axioms  describing  information  and  goals. 
One  of  the  formal  properties  satisfied  by  K  is  the  deductive  closure  axiom,  which 
can  be  written  as  follows: 

7\(x,p  —  q)  —  (A'(x,p)  —  I<(x,q))  . 

The  analogous  axiom  for  goals  is 

A‘(x,p-.?)  —  (G(x,#)-»  G(x,p))  . 

This  is  precisely  the  subgoaling  axiom.  If  the  agent  has  q  as  a  goal  and  carries 
the  information  that  q  is  implied  by  some  other,  more  specific,  condition,  p,  the 
agent  is  justified  in  adopting  p  as  a  goal.  The  validity  of  this  axiom  can  be 
established  directly  from  the  definition  of  G. 

Given  these  two  ways  of  viewing  the  semantics  of  data  structures,  we  can 
revisit  the  state-machine  model  of  agents  introduced  above.  Rather  than  specify 
the  action  component  of  the  machine  as  a  function  of  one  argument  interpreted 
in  purely  “informational"  terms,  /(f),  it  may  be  much  more  convenient  for  de¬ 
signers  to  define  it  as  a  function  of  two  arguments,  f'(g,i)  where  the  g  argument 
is  interpreted  as  representing  the  dynamic  goals  of  the  agent.  Where  does  the  g 
input  come  from?  Clearly,  it  must  ultimately  be  computed  from  the  agent’s  cur¬ 
rent  information  state  as  well  as  its  static  goals,  go.  As  such,  it  must  be  equiva¬ 
lent  to  some  non-goal-dependent  specification:  /(f)  =  f'(extract(i,go),i).  Nev¬ 
ertheless,  the  decomposition  into  a  goal-extraction  module  and  a  goal-directed 
action  module  may  significantly  ease  the  cognitive  burden  for  the  designer  while 
leaving  him  secure  in  the  knowledge  that  it  is  semantically  grounded. 

1.4  Software  Tools  for  Agent  Design 

Although  it  is  conceptually  important  to  have  a  formal  understanding  of  the 
semantics  of  the  data  structures  in  an  embedded  agent,  this  understanding  does 
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not,  directly,  simplify  the  programmer’s  task.  For  this  reason,  it  is  necessary  to 
design  and  implement  software  tools  that  are  based  on  proper  foundations  and 
that  make  it  easier  to  program  embedded  agents. 

Rex  [5,7]  is  a  language  that  allows  the  programmer  to  use  the  full  recur¬ 
sive  power  of  Lisp  at  compile  time  to  specify  a  synchronous  digital  circuit. 
The  circuit  model  of  computation  facilitates  semantic  analysis  in  the  situated- 
automata  theory  framework.  Rex  provides,  however,  a  low-level,  operational 
language  that  is  more  akin  to  standard  programming  languages  than  to  declar¬ 
ative  AI  languages.  For  this  reason,  we  have  designed  and  implemented  a  pair  of 
declarative  programming  languages  on  top  of  the  base  provided  by  Rex.  Ruler 
[10]  is  based  on  the  “informational”  semantics  and  is  intended  to  be  used  to 
specify  the  perception  component  of  an  agent.  Gapps  [6]  is  based  on  the  “goal” 
semantics  and  is  intended  to  be  used  to  specify  the  action  component  of  an 
agent.  In  the  rest  of  this  paper,  we  will  describe  the  Gapps  language,  its  use 
in  programming  embedded  agents,  and  a  number  of  extensions  that  relate  it  to 
more  traditional  work  in  planning. 


2  Gapps 

In  this  section  we  describe  Gapps,  a  language  for  specifying  behaviors  of  com¬ 
puter  agents  that  retains  the  advantage  of  declarative  specification,  but  gen¬ 
erates  run-time  programs  that  are  reactive,  do  parallel  actions,  and  carry  out 
strategies  made  up  of  very  low-level  actions. 

Gapps  is  intended  to  be  used  to  specify  the  action  component  of  an  agent. 
The  Gapps  compiler  takes  as  input  a  declarative  specification  of  the  agent’s 
top-level  goal  and  a  set  of  goal-reduction  rules,  and  transforms  them  into  the 
description  of  a  circuit  that  has  the  output  of  the  perception  component  as  its 
input,  and  the  output  of  the  agent  as  a  whole  as  its  output.  The  output  of  the 
agent  may  be  divided  into  a  number  of  separately  controllable  actions,  so  that 
we  can  independently  specify  procedures  that  allow  an  agent  to  move  and  talk 
at  the  same  time.  A  sample  action  vector  declaration  is: 

(dedare-act  ion- vector 

(left-wheel-velocity  int) 

(right-wheel-velocity  int) 

(speech  string)) 

This  states  that  the  agent  has  three  independently  controllable  effectors  and 
declares  the  types  of  the  output  values  that  control  them. 

In  the  following  sections,  we  shall  present  a  formal  description  of  Gapps 
and  its  goal  evaluation  algorithm,  and  explain  how  Gapps  specifications  can  be 
instantiated  as  circuit  descriptions. 
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2.1  Goals  and  Programs 

The  Gapps  compiler  maps  a  top-level  goal  and  a  set  of  goal-reduction  rules  into 
a  program.  In  this  section  we  shall  clarify  the  concepts  of  goal,  goal-reduction 
rule,  and  program. 

There  are  three  primitive  goal  types:  goals  of  execution,  achievement,  and 
maintenance.  Goals  of  execution  are  of  the  form  do(a),  with  a  specifying  an 
instantaneous  action  that  can  be  taken  by  the  agent  in  the  world — the  agent’s 
goal  is  simply  to  perform  that  action.  If  an  agent  has  a  goal  of  maintenance, 
notated  maint(p),  then  if  the  proposition  p  is  true,  the  agent  should  strive 
to  maintain  the  truth  of  p  for  as  long  as  it  can.  The  goal  ach(p)  is  a  goal 
of  achievement,  for  which  the  agent  should  try  to  bring  about  the  truth  of 
proposition  p  as  soon  as  possible.  The  set  of  goals  is  made  up  of  the  primitive 
goal  types,  closed  under  the  Boolean  operators.  The  notions  of  achievement 
and  maintenance  are  dual,  so  we  have  ->ach(p)  =  maint(-'p)  and  ->maint(p)  = 
ach(->p). 

In  order  to  characterize  the  correctness  of  programs  with  respect  to  the 
goals  that  specify  them,  we  must  have  a  notion  of  an  action  leading  lo  a  goal. 
Informally,  an  action  a  leads  to  a  goal  G  (notated  a  — ►  G)  if  it  constitutes  a 
correct  step  toward  the  satisfaction  of  a  goal.  For  a  goal  of  achievement,  the 
action  must  be  consistent  with  the  goal  condition’s  eventually  being  true;  for 
a  goal  of  maintenance,  if  the  condition  is  already  true,  the  action  must  in  yj 
that  it  will  be  true  at  the  next  instant  of  time.  The  leads  lo  operator  must  also 
have  the  following  formal  properties: 


a-v.  do(a) 
(a  G)  A  (a  G') 

=> 

a  (G  A  G ') 

(a  -v*  G)  V  (a  G') 

=> 

a  (G  V  G1) 

cond  (p,  a  ~  G,a~^  G') 

=> 

a  cond(p,G,G') 

(a  ~  G)  A  {G  -  G') 

a  G' . 

This  definition  captures  a  weak  intuition  of  what  it  means  for  an  action  to  lead 
to  a  goal.  The  goal  of  doing  an  action  is  immediately  satisfied  by  doing  that 
action.  If  an  action  leads  to  each  of  two  goals,  it  leads  to  their  conjunction; 
similarly  for  disjunction  and  conditionals,  The  definition  of  leads  to  for  goals 
of  achievement  may  seem  too  weak — rather  than  saying  that  doing  i  action 
is  consistent  with  achieving  the  goal,  we  would  like  somehow  to  say  that  the 
action  actually  constitutes  progress  toward  the  goal  condition.  Unfortunately, 
it  is  difficult  to  formalize  this  notion  in  a  domain-independent  way.  In  fact,  any 
definition  of  leads  lo  that  satisfies  this  definition  is  compatible  with  the  goal 
reduction  algorithm  used  by  Gapps,  so  the  definition  may  be  strengthened  for 
a  particular  domain. 

Goal  reduction  rules  are  of  the  form  (defgoalr  G  G ')  and  have  the  se¬ 
mantics  that  the  goal  G  can  be  reduced  to  the  goal  G'\  that  is,  tliat  G'  is  a 
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specialization  of  (7,  and  therefore  implies  G.  In  this  case,  any  action  that  leads 
to  G'  will  also  lead  to  G. 

A  program  is  a  finite  set  of  condition-action  pairs,  in  which  the  condition  is 
a  run-lime  expression  (actually  a  piece  of  Rex  circuitry  with  a  Boolean-valued 
output)  and  an  action  is  a  vector  of  run-time  expressions,  one  corresponding 
to  each  primitive  output  field.  These  actions  are  run-time  mappings  from  the 
perceptual  inputs  into  output  values,  and  can  be  viewed  as  strategies,  in  which 
the  particular  output  to  be  generated  depends  on  the  external  state  of  the  world 
via  the  internal  state  of  the  agent.  Allowing  the  actions  to  be  entire  strategies 
is  very  flexible,  but  makes  it  impossible  to  enumerate  the  possible  values  of  an 
output  field.  In  order  to  specify  a  program  that  controls  only  the  speech  field 
of  an  action  vector,  we  need  to  be  able  to  describe  a  program  that  requires  the 
speech  field  to  have  a  certain  value,  but  makes  no  constraints  on  the  values 
of  the  other  fields.  One  way  to  do  this  would  be  to  enumerate  a  set  of  action 
vectors  with  the  specified  speech  value,  each  ~r  which  has  different  values  for  the 
other  action  vector  components.  Instead  of  doing  this,  we  allow  elements  of  an 
action  vector  to  contain  the  value  0,  which  stands  for  all  possible  instantiation^ 
of  that  field. 

A  program  II,  consisting  of  the  condition-action  psirs  {(ci ,  ai), . . . ,  (c„,  a„)}, 
is  said  to  weakly  satisfy  a  goal  G  if,  for  every  condition  c,-,  if  that  condition  is 
true,  the  corresponding  action  a,-  leads  to  G.  That  is, 

II  weakly  satisfies  G  <=>  Vf.c,-  -*  (a,-  G). 

Note  that  the  conditions  in  a  program  need  not  be  exhaustive — satisfaction  does 
not  require  that  there  be  an  action  that  leads  to  the  goal  in  every  situation,  since 
this  is  impossible  in  general.  We  will  refer  to  the  class  of  situations  in  which  a 
program  does  specify  an  action  as  the  domain  of  the  program.  We  define  the 
domain  of  II  as 

dom(II)  =  \/c». 
i 

A  goal  G  is  strongly  satisfied  by  program  II  if  it  is  weakly  satisfied  by  II  and 
dom(Il)  =  true;  that  is,  if  for  every  situation,  II  supplies  an  action  that  leads 
to  G.  The  conditions  in  a  program  need  not  be  mutually  exclusive.  When 
more  than  one  condition  of  a  program  is  true,  the  action  associated  with  each 
of  them  leads  to  the  goal,  and  an  execution  of  the  program  may  choose  among 
these  actions  nondeterministically. 

Given  the  non-deterministic  execution  model,  we  can  give  programs  a  declar¬ 
ative  semantics,  as  well.  A  program  II  =  {(cj.aj),.. . ,(c„,a„)),  can  be  thought  f 
of  has  having  the  logical  interpretation 

(/\(a»  —  Ci)  A  Y  a,)  V  -*  \J  a  . 

t  t  » 

Either  the  domain  of  the  program  is  false  (the  second  clause)  or  there  is  some 
action  that  is  executed  and  the  condition  associated  with  that  action  is  true. 
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2.2  Recursive  Goal  Evaluation  Procedure 


Gapps  is  implemented  on  top  of  Ilex,  and  makes  use  of  constructs  from  the  Rex 
language  to  provide  perceptual  tests.  There  is  not  room  here  to  describe  the 
details  of  the  Rex  language,  so  we  refer  the  interested  reader  to  other  papers 
[5,7].  Gapps  programs  arc  made  up  of  a  set  of  goal  reduction  rules  and  a  top- 
levei  goal-expression.  The  general  form  of  a  goal-reduction  rule  is 

(defgoalr  goal-pat  goal-ezpr  ), 

where 

goal-pat  ::=  (ach  pat  rtx-params  > 

(naint  pat  rex-params  ) 

goal-ezpr  ::=  (do  index  rex-ezpr  ) 

(and  goal-ezpr  goal-expr  ) 

(or  goal-expr  goal-rxpr ) 

(not  goal-ezpr ) 

(if  rex-ezpr  goal-ezpr  goal-expr  ) 

(ach  pat  rex-ezpr ) 

(naint  pat  rex-ezpr ) 

index  is  a  keyword,  pat  is  a  compile-time  pattern  with  unifiablc  'riables,  rex- 
ezpr  is  a  Rex  expression  specifying  a  run-time  function  of  inpu  variables,  and 
rex-params  is  a  structure  of  variables  that  becomes  bound  to  the  result  of  a  rex- 
expr.  The  details  of  these  constructs  will  be  discussed  in  the  following  sections. 

The  Gapps  compiler  is  an  implementation  of  an  evaluation  function  that 
maps  goal  expressions  into  programs,  using  a  set  of  goal  reduction  rules  supplied 
by  the  programmer.  In  this  section  we  shall  present  the  evaluation  procedure; 
we  have  shown  that  it  is  correct;  that  is,  that  given  a  goal  6'  and  a  set  of 
reduction  rules  T,  eval(G,  T)  weakly  satisfies  G. 

Given  a.  reduction-rule  set  Gama,  we  define  the  evaluation  procedure  as 
follows: 

define  eval(G) 
case  first (G) 

do  :  make-pri«itive-prograii(second(G)  , third(G)) 
and:  conjoin-prograas(eval (second (G)) ,eval(thir J(G))) 
or  :  disjoin-prograns(eval (second (G)) ,eval(third(G))) 
not:  eval  (negate-goal-expr (second(G))) 
if  :  disjoin-prograos 

(conjoin-cond(seeond(G) ,eval (third (G))> , 
con join-cond(negate-cond(G) ,eval (f ourth (G) ) ) ) 
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■aint, 

ach:  for  all  R  in  Gaaaa  such  that  natch (G .head  (R>) 
dis join-progr ana (e  val (body (R) ) 

We  shall  now  consider  each  of  these  cases  in  turn. 

Do 

The  function  make-primit ive-prograa  takes  an  index  and  a  Rex  expression 
and  returns  a  program.  The  index  indicates  which  of  the  fields  of  the  action 
vector  is  being  assigned,  and  the  Rex  expression  denotes  a  function  from  the 
input  to  values  for  that  action  field.  It  is  formally  defined  as 

make-primitive-program  (i,  rez-expr )  = 

{((me,  (0, . . . ,  rez-expr, .... 0))}, 

with  the  rez-expr  in  the  ith  component  of  the  action  vector.  This  program 
allows  any  action  so  long  as  component  i  of  the  action  is  the  strategy  described 
by  rex-ezpr. 

And 

Programs  are  conjoined  by  taking  the  cross-product  of  their  condition-action 
pairs  and  merging  each  of  elements  of  the  cross-product  together.  In  conjoining 
two  programs,  the  merged  action  vector  is  associated  with  the  conjunction  of 
the  conditions  of  the  original  pairs,  together  with  the  condition  that  the  two 
actions  are  mergeable.  The  conjunction  procedure  simply  finds  the  pairs  in  each 
program  that  share  an  action  and  conjoins  their  conditions.  We  can  define  the 
operation  formally  as 

conjoin-programs  (IT,  II")  = 

{{(c-  A  c'-  A  mergeable  (a-, a}')),  merge  (a-, a"))} 

for  1  <  t  <  m,  1  <  j  <  n  where 

n'  =  {(c\A) . <C<4>} 

n"  =  {(c'/.a?) . (c",a"». 

The  conjunction  operation  preserves  the  declarative  semantics  of  programs;  that 
is,  the  semantic  interpretation  of  the  conjoined  program  is  implied  by  the  con¬ 
junction  ol  the  semantic  interpretations  of  the  individual  programs. 

Two  action  vectors  are  mergeable  if,  for  each  component,  at  least  one  of  them 
is  unspecified  or  they  are  equal. 

mergeable  ((<*»,. ..,an),(tj,. = 

Vi. (a,  =  0)  V  (6,  =  0)  V  (a,  =  &,•). 
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If  either  component  is  unspecified,  the  test  can  be  completed  at  compile  time 
and  no  additional  circuitry  is  generated.  Otherwise,  an  equality  test  is  conjoined 
in  with  the  conditions  to  be  tested  at  run  time. 

Action  vectors  are  merged  at  the  component  level,  taking  the  defined  element 
if  one  is  available.  If  the  vectors  are  unequally  defined  on  a  component,  the  result 
is  undefined: 

merge  ((a, , . . . , a„),  (6, , . . . , bn))  =  (c,, . . . , c„),  where 

{a,-  if  bi  =  0  or  b,-  =  a< 
b,  if  a{  =  0 
1  otherwise. 

The  merger  of  two  action  vectors  results  in  an  action  vector  that  allows  the 
intersection  of  the  actions  allowed  by  the  original  ones. 

Or 

The  disjunction  of  two  programs  is  simply  the  union  of  their  sets  of  condition- 
action  pairs.  Stated  formally, 

dis  join-programs  (11',  n")  =  II'  U  II". 

Not 


In  Gapps,  negation  is  driven  into  an  expression  as  far  as  possible,  using  De- 
Morgan’s  laws  and  the  duality  of  ach  and  aaint,  until  the  only  expressions 
containing  not  are  those  of  the  form  (ach  (not  pat)),  (maint  (not  pat)), 
and  (not  (do  index  rez-expr)).  In  the  first  two  cases,  there  must  be  explicit 
reduction  rules  for  the  goal;  in  the  last  case  we  simply  return  the  empty  pro¬ 
gram.  The  handling  of  negation  could  be  much  stronger  if  we  provided  for  the 
enumeration  of  all  possible  values  of  any  action  vector  component  and  required 
them  to  be  known  constants  at  compile  time.  Then  (not  (do  left-velocity 
6))  would  be  the  same  as  V,ys  ma ke-priTnilive-progratn(Ufl-velocily,i) ;  that  is, 
license  to  go  at  any  velocity  but  6.  As  we  noted  before,  these  limitations  are  too 
severe  for  use  in  controlling  a  complex  agent  that  has  large  numbers  of  possible 
outputs. 

The  procedure  negate-goal-expression  rewrites  goal  expressions  as  fol¬ 
lows: 


(not  (and  G\  Gj)) 
(not  (or  G\  Gj)) 
(not  (not  G)) 
(not  (if  c  G\  Gi  5) 
(not  (ach  p)) 
(not  (aaint  p)) 


=*>  (or  (not  G i)  (not  G 2)) 

=>  (and  (not  Gj)  (not  Gj)) 
=>  G 

=>  (if  c  (not  Gi)  (not  Gi)) 
=>  (aaint  (not  p)) 

=>■  (ach  (not  p)) 
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If 


The  evaluation  procedure  for  conditional  programs  hinges  on  the  definition  of 
the  conditional  operator  cond(p,  q,  r)  as  (p  A  q)  V  (->p  A  r).  The  procedure  for 
conjoining  a  condition  and  a  program  is  defined  as  follows: 

conjoin-cond  (p, II)  =  {(pAci,ai),...,{pAcn,a„)}. 

Thus, 

disjoin-progra*s  (conjoin-cond  (p,  II'),  conjoin-cond  (— «p,  II"))  = 

{(P  A  ej ,  aj ), . . . ,  (p  A  cj, ,  o„),  (-ip  A  c", o j') . (“’P  Ac£,a")}. 


Ach  and  Maint 

Goals  of  maintenance  and  achievement  are  evaluated  by  disjoining  the  results 
of  al.  applicable  reduction  rules  in  the  rulebase  P.  A  reduction  rule  whose  head 
is  the  expression  (ach  pai\  rcz-params )  matches  the  goal  expression  (ach  paij 
nx-expr)  if  pai\  and  pa<2  can  be  unified  in  the  current  binding  environment. 
The  patterns  are  s-expressions  with  compile-lime  variables  that  are  marked  by  a 
leading  ?.  The  ilex  expression  and  parameter  arguments  may  be  omitted  if  they 
are  null.  The  binding  environment  consists  of  other  bindings  of  compile-time 
variables  within  the  goal  expression  being  evaluated.  Thus,  when  evaluating 
the  (ach  (go  ?p))  subgoal  of  the  goal  (and  (ach  (drive  ?q  ?p))  (ach  (go 
?p) ) ) ,  we  may  already  have  a  binding  for  ?p.  As  in  Prolog,  evaluation  of  this 
goal  will  backtrack  through  all  possible  bindings  of  ?p  and  ?q. 

Once  a  pattern  has  been  matched,  Gapps  sets  up  a  new  compile-time  binding 
environment  for  evaluating  the  body  of  the  rule.  This  is  necessary  in  case 
variables  in  the  body  are  bound  by  the  invocation,  as  in 

(defgoalr  (ach  (at  ?p)  [dist-err  angle-err]) 

(if  (not-facing  ?p  angle-err) 

(ach  (facing  ?p)  angle-err) 

(ach  (aoved-tovard  ?p)  dist-err)))  . 

In  the  rule  above,  (at  ?p)  is  a  pattern,  ?p  is  a  compile-time  parameter, 
dist-err  and  angle-err  are  Rex  variables,  and  (not-facing  ?p  angle-err) 
will  be  a  Rex  expression  once  a  binding  is  substituted  for  ?p.  A  possible  invo¬ 
cation  of  this  rule  would  be: 

(ach  (at  (office-of  stan))  [*distance-eps*  10])  . 

Gapps  also  creates  a  new  Rex-variable  binding  environment  when  the  rule  is 
invoked,  binding  the  Rex  variables  in  the  head  to  the  evaluated  Rex  expressions 
in  the  invocation.  These  variables  may  appear  in  Rex  expressions  in  the  body  of 
the  rule.  Note  that  compile-time  variables  may  also  be  used  in  Rex  expressions, 
in  order  to  choose  at  compile  time  from  among  a  class  of  available  run-time 
functions. 
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Figure  2:  Circuit  generated  from  Gapps  program 


2.3  Generating  a  Circuit 

Once  a  goal  expression  has  been  evaluated,  yielding  a  program,  a  circuit  similar 
to  the  one  shown  in  Figure  2,  that  instantiates  the  program  is  generated.2  Be¬ 
cause  any  action  whose  associated  condition  is  true  is  sufficient  for  correctness, 
the  conditions  are  tested  in  an  arbitrary  order  that  is  chosen  at  compile  time. 
The  output  of  the  circuit  is  the  action  corresponding  to  the  hrst  condition  that 
is  true.  If  no  condition  is  satisfied,  an  error  action  is  output  to  signal  the  pro¬ 
grammer  that  he  has  made  an  error.  If,  at  the  final  stage  of  circuit  generation, 
there  are  still  0  components  in  an  action  vector,  they  must  be  instantiated  with 
an  arbitrary  value.  The  inputs  to  the  circuit  are  computed  by  the  Rex  expres¬ 
sions  supplied  in  the  il  and  do  forms.  The  outputs  of  the  circuit  are  used  to 
control  the  agent. 

2.4  Reducing  Conjunctive  Goal  Expressions 

Conjunctive  goal  expressions  can  have  two  forms:  ( ach-or-maini  (and  pi  P2)) 
and  (and  ( ach-or-maint  pO  ( ach-or-mainl  P2)).  Because  of  the  properties  of 
maintainance,  the  goals  (maint  (and  pi  P2))  and  (and  (maint  pi)  (maint 
P2))  are  semantically  equivalent.  This  is  not  true,  however,  for  goals  of  achieve¬ 
ment.  The  goal  (ach  (and  p\  P2))  requires  that  pt  and  p2  be  true  simulta¬ 
neously,  whereas  the  goal  (and  (ach  GO  (ach  Gn))  requires  only  that  they 
each  be  true  at  some  time  in  the  future. 

Goals  of  the  form  (.ach-or-maint  (and  p\  pO)  can  only  be  reduced  us¬ 
ing  reduction  rules  whose  pattern  matches  this  conjunctive  pattern.  Goals  of 
the  form  (and  ( ach-or-maini  pO  ( ach-or-maint  pO )  can  be  reduced  in  two 

2 An  equivalent,  but  more  confusing,  circuit  with  log(n)  depth  can  be  generated  for  im¬ 
proved  performance  on  parallel  machines. 
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ways:  using  the  standard  evaluation  proced  ..re  for  conjunctive  goals  and  us¬ 
ing  special  reduction  rules.  It  is  often  the  case  that  an  effective  behavior  for 
achieving  Gj  and  achieving  Go  cannot  be  generated  simply  by  conjoining  pro¬ 
grams  that  achieve  G\  and  G 2  individually.  A  program  for  the  goal  (and  (acb 
have  hammer)  (ach  have  saw))  will  almost  certainly  be  incomplete  when  the 
two  tools  are  in  different  rooms,  because  there  will  be  no  actions  available  that 
are  consistent  with  the  standard  programs  for  achieving  each  of  the  subgoals. 
Because  of  this,  we  allow  reduction  rules  of  the  form  (defgoalr  (and  ( ach-or - 
maint  pat\  nx-paramsi )  (ach-or-maint  patj  rex-params2))  goal-cxpr)  so  that 
special  behaviors  can  be  generated  in  the  face  of  a  conjunctive  goal. 

Following  is  an  example  that  illustrates  both  kinds  of  conjunctive  goals.  At 
the  top  level,  the  goal  is  to  have  the  hammer  and  saw  simultaneously,  but  this 
reduces  to  conjunctions  of  ach  and  maint  goals. 

(defgoalr  (ach  (and  (have  haaner)  (have  sav)) 

(if  (have  haaaer) 

(and  (aaint  have  haaaer)  (ach  have  sav)) 

(if  (have  sav) 

(and  (aaint  have  sav)  (ach  have  haaaer)) 

(if  (doser-than  haaaer  sav) 

(ach  have  haaaer) 

(ach  have  sav))))) 

The  agent  will  pursue  the  closer  object  until  he  has  it,  then  pursue  the  second 
while  maintaining  posession  of  the  first.  We  might  need  a  similar  rule  for  re¬ 
ducing  the  conjunctions  of  goals  of  achievement  and  maintenance.  Instead  of 
the  specific  rule  above,  we  could  write  a  more  generic  sequencing  rule,  like  the 
following: 

(defgoalr  (ach  (and  ?gl  ?g2)  [gl-paraas  g2-paraas]) 

(if  (holds  ?gl  gl-paraas) 

(and  (aaint  ?gl  gl-paraas)  (ach  ?g2  g2-paraas)) 

(if  (holds  ?g2  g2-parans) 

(and  (aaint  ?g2  g2-paraas) 

(ach  ?gl  gl-paraas)) 

(if  (better-to-pursue  ?gl  gl-paraas 
?g2  g2-paraas) 

(ach  ?gl  gl-paraas) 

(ach  ?g2  g2-paraas)))))  . 

The  generic  form  of  the  rule  assumes  that  there  is  a  Rex  function,  holds,  that 
takes  a  compile-time  parameter  and  generates  a  circuit  that  tests  to  see  whether 
the  predicate  encoded  by  the  compile-time  parameter  and  the  run-time  variables 
is  true  in  the  world. 

2.5  Prioritized  Goal  Lists 

It  is  often  convenient  to  be  able  to  specify  a  prioritized  list  of  goals.  In  Gapps,  we 
can  do  this  with  a  goal  expression  of  the  form  (prio  goal-txpr\  . . .  goal-txprn). 
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The  semantics  of  this  is 

cond(dom(IIi),ni, 

cond(dom(n2),n2,..., 

cond(dom(IIn_i),  Il„_i ,  IIn) . . .)), 

where  11,-  =  eval(yoa/-crpr,).  The  domain  of  a  program  (true  in  a  situation  if 
the  program  has  an  applicable  action  in  that  situation)  is  the  disjunction  of  the 
conditions  in  the  program.  A  program  for  aprio  goal  executes  the  first  program, 
unless  it  has  no  applicable  action,  in  which  case  it  executes  the  second  program, 
and  so  on.  At  circuit-generation  time,  this  construct  can  be  implemented  simply 
by  concatenating  the  programs  in  priority  order,  and  executing  the  first  action 
whose  corresponding  condition  is  satisfied. 

An  example  of  the  use  of  the  prio  construct  comes  about  when  there  is  more 
than  one  way  of  achieving  a  particular  goal  and  one  is  preferable  to  the  other 
for  some  reason,  but  is  not  always  applicable.  We  might  have  the  rule 

(defgoalr  (ach  in-rooa  r) 

(prio  (ach  f ollos-planned-route-to  r) 

(ach  use-local-navigation-to  r)))  . 

This  rule  states  that  the  agent  should  travel  to  rooms  by  following  planned 
paths,  but  if  for  some  reason  it  is  impossible  to  do  that,  it  should  do  so  through 
local  navigation.  The  same  effect  could  be  achieved  with  an  if  expression,  but 
this  rule  does  not  require  the  higher-level  construct  to  know  the  exact  conditions 
under  which  the  higher-priority  goal  will  fail. 

2.6  Prioritized  Conjunctions 

An  interesting  special  case  of  a  prioritized  set  of  goals  is  a  prioritized  conjunction 
of  goals,  in  which  the  most  preferred  goal  is  the  entire  conjunction,  and  the  less 
preferred  goals  are  the  conjunctions  of  shorter  and  shorter  prefixes  of  the  goal 
sequence.  We  define  (prio-and  G i  (?2  ■  ••G„)  to  be 

(prio  (and  G\  G2  Gn ) 

(and  G\  (?2  •••  Gn- 1)  ... 

(and  G\  Gi ) 

Ci). 

Isaac  Asimov’s  three  laws  of  robotics  [1]  are  a  well-known  example  of  this 
type  of  goal  structure.  As  another  example,  consider  a  robot  that  can  talk  and 
push  blocks.  It  has  as  its  top-level  goal 

(prio-and  (naint  not-crashed) 

(ach  (in  blockl  roo«3)) 

(naint  hunans-not-bothered))  . 
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It  also  has  rules  that  say  that  any  action  with  the  null  string  in  the  talking 
field  will  maintain  humans-not-bothered;  that  (in  ?x  ?y)  can  be  achieved 
by  pushing  ?x  or  by  asking  a  human  to  pick  it  up  and  move  it;  and  that  any 
action  that  keeps  the  robot  from  coming  into  contact  with  a  wall  will  maintain 
not-crashed.  As  long  as  the  robot  can  push  the  block,  it  can  satisfy  all  three 
conditions.  If,  however,  the  block  is  in  a  corner,  getting  in  a  position  to  push 
it  would  require  sharing  space  with  a  wall,  thus  violating  the  first  subgoal.  The 
most  preferred  goal  cannot  be  achieved,  so  we  consider  the  next-most-preferred 
goal,  obtained  by  dropping  the  last  condition  from  the  conjunction.  Since  it  is 
now  allowed  to  bother  humans,  the  robot  can  satisfy  its  goal  by  asking  someone 
to  move  the  block  for  it.  As  soon  as  the  human  complies,  moving  the  block 
out  of  the  corner,  the  robot  will  automatically  revert  to  its  former  pushing 
behavior.  This  is  a  convenient  high-level  construct  for  programming  flexible 
reactive  behavior  without  the  need  for  the  programmer  to  explicitly  envision 
every  combination  of  conditions  in  the  world.  It  is  important  to  remember  that 
all  of  the  symbolic  manipulation  of  the  goals  happens  at  compile  time;  at  run 
time,  the  agent  simply  executes  the  action  associated  with  the  first  condition 
that  evaluates  to  true. 


3  Extending  Gapps 

Gapps  is  an  appropriate  language  for  specifying  action  maps  that  can  be  hard¬ 
wired  at  the  compile  time  of  the  agent.  In  this  section,  we  will  consider  ways 
of  extending  and  augmenting  Gapps  to  do  exhaustive  planning  at  compile  time, 
to  do  run-time  planning,  and  to  do  run-time  goal  reduction. 

3.1  Universal  Planning  with  Goal-Reduction  Schema 

Schoppers  [13]  has  introduced  the  notion  of  a  universal  plan.  A  universal  plan 
is  a  function  that,  for  a  given  goal,  maps  every  possible  input  situation  of  the 
agent  into  an  action  that  leads  to  (in  an  informal  sense)  that  goal.  The  program 
resulting  from  the  Gapps-evaluation  of  a  goal  can  be  thought  of  as  a  universal 
plan,  mapping  situations  to  actions  in  service  of  the  top-level  goal. 

Schoppers’  approach  differs  from  Gapps  in  that  the  user  specifies  the  capa¬ 
bilities  of  the  agent  in  an  operator-description  language.  This  language  allows 
the  user  to  specify  a  set  of  atomic  capabilities  of  the  agent,  called  operators, 
and  the  expected  effect  that  executing  each  of  the  operators  will  have  on  the 
world,  depending  possibly  on  the  state  of  the  world  in  which  the  operator  was 
executed. 

Another  way  to  characterize  operators  is  through  the  use  of  a  regression 
function  [8].  The  relation  q  =  rcgress(a.p)  holds  if.  whenever  q  holds  in  the 
world,  the  agent’s  performing  action  a  will  cause  p  to  hold  in  the  world  as  a 
result.  In  general,  the  regression  function  will  return  the  weakest  such  q.  Regres- 
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sion  is  usually  used  to  look  backwards  from  a  goal-situation  p;  the  proposition 
q  describes  a  set  of  situations  that  are  only  one  “step”  or  operator  application 
away  from  the  set  of  situations  satsifying  p.  We  know  that  if  the  agent  can  get 
to  a  situation  satisfying  q,  it  can  easily  get  to  a  situation  satisfying  p. 

The  following  schematic  Gapps  rule  allows  it  to  do  the  exhaustive  backward¬ 
chaining  search  that  is  typically  done  by  a  planner,  in  order  to  construct  a 
universal  plan.  The  Gapps  compiler  must  be  augmented  slightly  by  giving  it 
a  depth-bound  for  its  backward  chaining,  because  this  rule  would,  by  default, 
cause  infinite  backward  chaining. 

(defgoalr  (ach  (before  ?p  ?q)) 

(if  (holds  ?q) 
fail 

(if  (holds  ?p) 

(do  anything) 

(if  (holds  (regress  ?a  ?p)) 

(do  ?a) 

(ach  (before  (regress  ?a  ?p)  (regress  ?a  ?q))))))) 

The  reduction  rule  is  for  goals  of  the  form  (ach  (before  ?p  ?q));  that  is,  the 
goal  is  to  achieve  some  condition  ?p  before  some  other  condition  ?q  obtains. 
This  form  of  achievement  goal  is,  we  think,  typical— -it  is  rare  that  an  agent 
has  a  goal  of  achieving  something  no  matter  how  long  it  takes.  The  rule  works 
as  follows:  if  ?q  is  true  in  the  world,  the  agent  fails;  if  ?p  holds  in  the  world, 
then  the  agent  can  do  anything  because  it  has  achieved  its  goal;  otherwise,  if, 
for  any  action  ?a,  (regress  ?a  ?p)  holds  (that  is,  performing  action  ?a  will 
cause  ?p  to  hold  next  time)  then  this  goal  reduces  to  the  goal  (do  ?a);  finally, 
this  goal  can  be  reduced  to  achieving,  for  any  action  ?a,  (before  (regress  ?a 
?p)  (regress  ?a  ?q).  The  final  reduction  says  that  it  is  good  for  the  agent  to 
get  into  a  state  from  which  action  ?a  achieves  the  goal  ?p  before  the  agent  gets 
into  a  state  from  which  action  ?a  achieves  the  releasing  condition  ?q,  because 
once  that  has  been  done,  all  the  agent  must  do  is  do  action  ?a. 

Consider  the  application  of  this  process  to  the  standard  3-block  blocks- world 
problem.  The  actions  are  named  atoms,  like  pab,  which  signifies  “put  a  on  b.” 
The  world  is  described  by  predicates  like  ca,  which  signifies  “clear  a”  and  obi, 
which  signifies  “on  b  table.”  An  additional  predicate,  time(i),  is  true  if  the  time 
on  some  global  clock,  which  starts  at  0,  is  i.  We  will  use  the  abbreviation  f,-  to 
stand  for  ime(i).  Given  the  goal  (ach  (before  (and  oab  obc)  (time  2))), 
the  evaluation  proceduie  returns  a  program  that  is  described  propositionally  as 
follows: 


{((-><2  A  obc  A  ca  A  cb),pab), 

{(_1<2  A  -dj  A  obc  A  cu  A  cb),pat), 
((-’/2  A  -><i  A  obc  A  oab  A  ca),pat), 
{{-'ti  A  -di  A  ca  Acb  A  cc),pbc), 
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((—><2  A  — *<i  Aoia  Ac6  Acc),p6c)}  . 

According  to  this  program,  if  6  is  on  c,  a  and  6  are  clear,  and  it  is  not  time  2, 
then  the  agent  can  put  a  on  b\  otherwise,  if  it  is  neither  time  1  nor  time  2,  the 
agent  can  do  a  variety  of  other  things.  For  instance,  if  6  is  on  c  and  a  and  6 
arc  clear,  the  agent  can  put  a  on  the  table.  This  illustrates  the  generality  of 
the  program.  Because  it  is  not  yet  time  1,  it  is  acceptable  to  undo  progress  (we 
might  have  some  other  reason  for  wanting  to  do  this),  because  there  is  time  to 
put  a  back  on  6  before  time  2.  Notice  that  this  program  is  not  complete.  There 
are  situations  for  which  it  has  no  action,  because  there  are  block  configurations 
that  cannot  be  made  to  satisfy  the  goal  in  two  actions.  Notice  also  that,  because 
this  is  a  program  of  the  standard  form  used  by  Gapps,  it  can  be  conjoined  in 
with  programs  arising  from  other  goals,  such  as  global  maintenance  goals.  Its 
generality,  in  allowing  any  sequence  of  actions  that  achieves  the  first  condition 
before  the  second,  makes  it  more  likely  that  conjoining  it  in  with  a  program 
expressing  some  other  constraint  will  result  in  a  non-null  program. 

3.2  Working  In  Parallel  With  an  Anytime  Planner 

When  the  size  of  the  state  space  is  large  that  doing  exhaustive  planning  at 
compile  time  is  impractical,  it  is  possible  to  solve  problems  described  as  planning 
problems  by  integrating  a  run-time  planning  system  with  the  Gapps  framework. 

We  can  express  the  planning  process  as  an  incremental  computation,  one  step 
of  which  is  done  on  each  tick.  On  each  tick  the  process  generates  an  output,  but 
it  may  be  one  that  means  “I  don’t  have  an  answer  yet."  After  some  number  of 
ticks,  depending  on  the  size  of  the  planning  problem,  the  planner  will  generate  a 
real  result.  This  result  could  be  cached  and  executed  as  in  a  traditional  system, 
or  the  agent  could  just  take  the  first  action  and  wait  for  the  planner  to  generate 
a  new  plan. 

Because  time  may  have  passed  since  the  planner  began  its  task,  we  must  take 
care  that  the  plan  it  generates  is  appropriate  for  the  situation  the  agent  finds 
itself  in  when  the  planner  is  finished.  This  can  be  guaranteed  if  the  planner 
monitors  the  conditions  in  the  world  upon  which  the  correctness  of  its  plan 
depends.  If  any  of  these  conditions  becomes  false,  the  planner  can  begin  again. 
This  behavior  will  be  correct,  though  not  always  optimal.  In  the  worst  case, 
the  planner  will  continuously  emit  the  “I  don’t  know”  output  and  the  agent  will 
react  reflexiveiy  to  its  environment  without  the  benefit  of  a  plan. 

The  kind  of  planner  discussed  above  is  a  degenerate  form  of  an  anytime 
algorithm  (3).  An  anytime  algorithm  always  has  an  answer,  but  the  answer 
improves  over  time.  In  the  example  given  above,  the  answer  is  useless  for  a  while, 
then  improves  dramatically  in  one  step.  It  might  be  useful  to  have  planning 
algorithms  that  improve  more  gradually.  Such  algorithms  exist  for  certain  kinds 
of  path  planning,  for  instance,  in  which  some  path  is  returned  at  the  beginning, 
but  the  algorithm  works  to  make  the  path  shorter  or  more  efficient.  There  is 
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still  a  difficult  decision  to  be  made,  however,  about  whether  to  take  the  first 
step  of  a  plan  that  is  known  to  be  non-optimal  or  to  spend  more  time  planning. 
For  many  everyday  activities,  optimality  is  not  crucial,  and  it  will  be  sufficient 
to  act  on  the  basis  of  a  simple  plan,  if  a  plan  is  required  at  all. 

From  the  perspective  of  Gapps,  the  anytime  planner  is  just  a  perceptual 
process  that  has  state.  It  is  “perceiving"  conditions  of  the  form:  “the  world  is 
in  a  state  such  that  if  I  do  action  a  followed  by  action  /?,  followed  by  action  7, 
my  goal  will  be  achieved."  The  following  Gapps  program  makes  use  of  such  a 
planner,  but  also  has  the  potential  for  reacting  to  emergency  situations: 

(defgoalr  (ach  (in  rooa)  [rt]) 

(if  (know-plan-for-getting-to-rooB  r  t) 

(ach  axecute-firat-stap 

(plan-for-getting-to-rooa  r  t)> 

(if  (ti*e-is-critical-for-getting-to-rooB  r  t) 

(ach  dri»«-in-the-direction-of-rooB  r) 

(saint  sit-still))))  , 

If  the  agent  has  the  goal  of  being  in  room  r  at  time  t,  and  lie  knows  a  plan 
for  getting  there,  then  he  should  execute  the  first  step  of  that  plan;  otherwise, 
if  it  looks  like  time  is  running  out,  the  agent  should  do  the  best  action  he  can 
think  of  at  the  moment;  if  there  is  no  problem  with  time,  his  best  course  of 
action  is  to  sit  still  and  wait  until  the  perception  component  has  produced  a 
plan.  These  issues  of  combining  planning  and  reactive  action  are  explored  more 
fully  by  Kaclhling  [-1], 

3.3  Run-Time  Goals 

So  far,  we  have  only  addressed  the  case  in  which  the  agent’s  top-level  goal  is 
specified  at  compile  time.  It  will  often  be  the  case  that  it  is  useful  to  think  of  the 
agent  as  acquiring  goals  at  run  time.  Before  we  can  discuss  ways  of  processing 
run-time  goals,  we  must  understand  their  semantics. 

3.3.1  Dispatching 

The  simplest  case  of  responding  to  run-time  goals  is  to  consider  them  to  be 
another  type  of  perceived  information  and  write  goal-reduction  rules  that  are 
conditional  on  the  given  goal.  As  an  example  of  this,  an  agent  could  be  given  the 
static  compile-time  goal  of  following  orders  and  reduction  rules  of  the  following 
form: 

(defgoalr  (saint  follow-orders) 

(if  (current-request-pending) 

(ach  goal-encoded-by  (perceived-comaand)) 

(do  twiddle-thumbs))) 

(defgoalr  (ach  goal-encoded-by  paraas) 
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(if  (aovt-coaaand  pirui) 

(ach  do-aove-cosmand  (gat-destination  paraas)) 

(if  (stop-coaaand  paraas) 

(ach  stopped) 

...»)  . 

The  agent  will  carry  out  requests  as  it  perceives  them  by  dispatching  to 
the  right  goal-reductions  based  on  the  nature  of  the  request.  This  method  is 
sufficient  for  many  cases,  but  requires  the  run-time  goals  to  be  of  a  few  limited 
types,  because  the  different  types  must  be  tested  for  and  dispatched  to  directly. 

3.3.2  Run  Time  Goal  Reduction 

An  alternative  to  explicit  dispatching  on  the  types  of  goals  is  to  interpret  Gapps- 
style  goal-reduction  rules  at  run  time.  An  interpreter  for  Gapps  is  very  similar 
to  the  evaluation  procedure,  except  that  the  result  at  each  step  is  a  set  of 
possible  actions,  rather  than  a  set  of  condition-action  pairs.  This  is  because 
the  interpretation  is  taking  place  at  run  time,  which  allows  all  of  the  conditions 
to  be  evaluated  during  the  interpretation  process,  rather  than  combined  into  a 
program  that  is  to  be  evaluated  later.  Any  action  can  be  chosen  from  the  set 
resulting  from  interpreting  the  top-level  goal  in  the  current  situation. 

Given  a  reduction-rule  set  Gamma,  wc  define  the  interpretation  procedure  as 
follows: 

define  interp(C) 
case  first (G) 

do  :  aake-action-set(second(G) ,rex~eval(third(G))) 
and:  conjoin-action-sets(intezp(second(G)) , interp (third (G))) 
or  :  disjoin-action-sets(interp(second(G)) ,interp(thixd(G))) 
not :  interp (negate-gc al-expr (second(G) ) ) 
if:  if  rex-eval(second(G))  then 
interp(third(G))  else 
intexp(fourth(G)) 

saint, 

ach:  for  all  R  in  Gassa  such  that  satch(G,head(R)) 
disjoin-action-sets(interp(body(R;) 

The  function  make-action-vector  takes  an  index  and  a  value  and  returns  the 
singleton  set  containing  the  action  vector  with  the  field  specified  by  the  index 
set  to  the  indicated  value.  That  is, 

make-action-vector(t,  v)  =  {(0, . . .  ,v, . . .  ,0)}  . 

The  value  is  calculated  by  evaluating,  in  the  current  state  of  the  world,  the  Rex 
expression  specifying  the  primitive  action.  Using  the  functions  mergeable  and 
merge  described  in  Section  2.2,  the  conjunction  of  action  sets  can  be  defined  as 

conjoin-action-sets(A/,A")  =  {merge(nj,ap  |  mergeable(aj,a")} 


20 


for  1  <  i  <  m,  1  <  j  <  n  wliere 

A'  =  K, 

a-  =  K . <}■ 

The  disjunction  of  two  action  sets  is  simply  the  union  of  the  sets: 

disjoin-action-sats(/l',.A/')  =  A'  U  A"  . 

The  crucial  difference  between  the  interpretation  procedure  and  the  eval¬ 
uation  procedure  is  in  the  if  case.  When  the  interpreter  encounters  an  if 
goal,  it  can  simply  test  the  condition  in  the  current  state  of  the  world  and  go 
on  to  interpret  the  subgoal  corresponding  to  the  result  of  the  test.  This  ob¬ 
viates  the  need  for  manipulating  formal  descriptions  of  conditions  during  the 
goal-interpretation  process. 

If  the  rule  set  is  fixed  at  compile  time  and  is  not  recursive,  interpretation 
can  be  done  by  a  fixed  circuit  (written,  perhaps,  in  Rex)  whose  depth  is  equal 
to  the  length  of  the  maximum-length  chain  of  rules  in  the  rule  set.  If  the  rule 
set  is  recursive,  a  depth  bound  will  have  to  be  imposed  in  order  to  guarantee 
real-time  response.  Another  possiblity  would  be  to  make  this  into  an  anytime 
algorithm  by  using  iterative  deepening  search  over  the  course  of  a  number  of 
ticks,  and  being  careful  that  conditions  that  have  already  been  evaluated  do  not 
change  their  values  during  the  search  process. 

If  the  agent  acquires  goal  reduction  rules  at  run  time,  perhaps  through  learn¬ 
ing,  then  the  interpretation  process  can  by  carried  out  by  general-purpose  goal- 
reduction  machinery.  It  can  either  be  done  in  real  time  by  a  fixed  circuit  or 
over  time  by  an  anytime  search  procedure.  If  interpretation  is  to  happen  in 
real  time,  there  must  be  a  limit  on  the  number  of  reduction  rules  that  can  be 
applied,  in  order  to  make  the  circuitry  be  of  fixed  size. 


Conclusions 

The  Gapps  goal-reduction  formalism  provides  a  flexible,  declarative  method  for 
describing  the  action  component  of  agents  that  must  operate  in  real-time  in 
dynamic  worlds.  It  has  a  formal  semantic  grounding  and  has  been  implemented 
and  used  in  a  variety  of  robotic  applications.  In  addition,  it  can  be  extended  in 
a  number  of  ways  for  use  in  domains  with  different  types  of  complexity. 
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1  Introduction 

Autonomous  agents  must  learn  to  act  in  complex,  noisy  domains.  This  paper 
will  provide  a  formal  description  of  the  problem  of  building  autonomous  agents 
that  learn  to  act  and  will  provide  metrics  for  comparing  learning  algorithms 
that  are  appropriate  for  autonomous  agents. 

Why  should  we  build  learning  agents?  A  program  that  “learns”  is  not  in¬ 
trinsically  better  than  one  that  does  not.  One  reason  to  build  learning  agents 
is  that  it  is  very  difficult  for  humans  to  write  explicit  programs  for  agents  that 
must  work  in  complex,  uncertain  environments.  In  programming  robots,  for 
instance,  it  is  common  for  a  human  programmer  to  learn  a  great  deal  about 
the  operation  of  the  robot’s  sensors  and  effectors  in  the  course  of  debugging 
programs  for  the  robot.  It  would  be  much  easier  and  less  time-consuming  if  the 
programmer  were  able  to  articulate  only  general  principles  about  the  environ¬ 
ment,  allowing  the  robot  to  experiment  and  learn  about  its  own  sensors  and 
effectors.  Another  reason  for  building  agents  that  learn  to  act  is  that  we  would 
like  to  have  agents  that  are  flexible  enough  to  work  in  a  variety  of  environments, 
adapting  their  perception  and  action  strategies  to  the  world  in  which  they  find 
themselves.  Even  if  a  human  could  completely  specify  the  program  for  an  agent 
to  operate  in  a  particular  environment,  the  agent  would  have  to  be  completely 
reprogrammed  to  move  it  to  a  new  environment. 

In  these  cases,  the  goal  of  the  agent’s  designer  is  to  have  the  agent  learn  what 
actions  it  should  perform  in  which  situations  in  order  to  maximize  an  external 
measure  of  success.  All  of  the  information  the  agent  has  about  the  external  world 

‘This  work  was  supported  in  part  by  a  gift  from  the  System  Development  Foundation  and 
in  part  by  the  Air  Force  Office  of  Scientific  Research  under  contract  #F49620-89-C-0055. 
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is  contained  in  a  scries  of  input*  that  it  receives  from  the  environment.  These 
inputs  may  encode  information  ranging  from  the  output  of  a  vision  system  to 
a  robot’s  current  battery  voltage.  The  agent  can  be  in  many  different  states  of 
information  about  the  environment,  and  it  must  map  each  of  these  information 
states,  or  situations,  to  a  particular  action  that  it  can  perform  in  the  world.  The 
agent’s  mapping  from  situations  to  actions  is  referred  to  as  an  action  map.  Part 
of  the  agent’s  input  from  the  world  encodes  the  agent’s  reinforcement,  which  is 
a  measure  of  how  well  the  agent  is  performing  in  the  world.  The  agent  should 
learn  to  act  in  such  a  way  as  to  maximize  its  total  reinforcement. 

As  a  concrete  example,  consider  a  simple  robot  with  two  wheels  and  two 
photo-sensors.  It  can  execute  five  different  actions:  slop,  go  forward,  go  back¬ 
ward,  turn  left,  and  turn  right.  It  can  sense  three  different  states  of  the  world: 
the  light  in  the  left  eye  is  brighter  than  that  in  the  right  eye,  the  light  in  the  right 
eye  is  brighter  than  that  in  the  left  eye,  and  the  light  in  both  eyes  is  roughly 
equally  bright.  Additionally,  the  robot  is  given  high  values  of  reinforcement 
when  the  average  value  of  light  in  the  two  eyes  is  increased  from  the  previous 
instant.  In  order  to  maximize  its  reinforcement,  this  robot  should  turn  left  when 
the  light  in  its  left  eye  is  brighter,  turn  right  when  the  light  in  its  right  eye  is 
brighter,  and  move  forward  when  the  light  in  both  eyes  is  equal.  The  problem  of 
learning  to  act  is  to  discover  such  a  mapping  from  information  states  to  actions. 

Thus,  the  problem  of  learning  to  act  can  be  cast  as  a  function-learning 
problem:  the  agent,  must  learn  a  mapping  from  the  situations  in  which  it  finds 
itself  to  the  actions  it  can  perform.  In  the  simplest  case,  the  mapping  will  be 
a  pure  function,  but  in  general  it  can  have  state,  allowing  the  action  taken  at 
a  particular  time  to  depend  on  any  previous  situation.  In  the  past  few  years 
there  has  been  a  great  deal  of  work  in  the  artificial  intelligence  and  theoretical 
computer  science  communities  on  the  problem  of  learning  pure  Boolean-valued 
functions  [10].  Unfortunately,  this  work  is  not  directly  relevant  to  the  problem 
of  learning  action  maps  because  of  the  different  settings  of  the  problem.  In  the 
traditional  function-learning  work,  a  learning  algorithm  is  presented  with  a  set 
or  series  of  input-output  pairs  that  specify  the  correct  output  to  be  generated 
for  that  particular  input.  This  set  ting  allows  for  effective  function  learning,  but 
does  not  mirror  the  situation  of  an  agent  trying  to  learn  an  action  map.  The 
agent,  finding  itself  in  a  particular  input  situation,  must  generate  an  action. 
It  then  receives  a  reinforcement,  value  from  the  environment,  indicating  ho./ 
effective  that  action  was.  The  agent  cannot.,  however,  deduce  the  reinforcement 
value  that  would  have  resulted  fiom  executing  any  of  its  other  actions.  Also,  if 
the  environment  is  noisy,  as  it.  will  be  in  general,  just  one  instance  of  performing 
an  action  in  a  situation  may  not  give  an  accurate  picture  of  the  reinforcement 
value  of  that,  action. 

The  problem  of  learning  action  maps  by  trial  and  error  is  often  referred  to  as 
rein f 01  cement  learning  because  of  Us  similarity  to  models  used  in  psychological 
studies  of  behavior-learning  in  humans  and  animals.  It  can  also  be  classified 
as  unsupervised  learning  because  correct  answers  are  not  provided  by  a  teacher 
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[14].  One  of  the  most  interesting  facets  of  the  reinforcement  learning  problem  is 
the  tension  between  performing  actions  that  are  not  well  understood  in  order  to 
gain  information  about  their  reinforcement  value  and  performing  actions  that 
are  expected  to  be  good  in  order  to  increase  overall  reinforcement.  If  an  agent 
knows  that  a  particular  action  works  well  in  a  certain  situation,  it  must  trade 
off  performing  that  action  against  performing  another  one  that  it  knows  nothing 
about,  in  case  the  second  action  is  even  better  than  the  first.  Another  important 
aspect  of  the  reinforcement-learning  problem  is  that  the  actions  that  an  agent 
performs  influence  the  input  situations  in  which  it  will  find  itself  in  the  future. 
Rather  than  receiving  an  independently  chosen  set  of  input-output  pairs,  the 
agent  has  some  control  over  what  inputs  it  will  receive  and  complete  control  over 
what  outputs  will  be  generated  in  response.  In  addition  to  making  it  difficult 
to  make  distributional  statements  about  the  inputs  to  the  agent,  this  makes  it 
possible  for  what  seem  like  small  “experiments”  to  cause  the  agent  to  discover 
an  entirely  new  part  of  its  environment. 

Because  of  these  differences  in  the  setting  of  the  learning  task,  algorithms 
(such  as  Michalski’s  star  method  (13),  Mitchell’s  version  spaces  [15,16]  and 
Valiant’s  algorithm  for  learning  fc-dnf  [25])  and  evaluation  metrics  (such  as 
PAC-learning  [24]  and  mistake  bounds  [12])  developed  for  traditional  function 
learning  are  not  appropriate  for  learning  to  act.  This  paper  focuses  on  building 
formal  foundations  for  the  problem  of  learning  in  autonomous  agents.  These 
foundations  must  allow  a  clear  statement  of  the  problem  and  provide  a  basis 
for  evaluating  and  comparing  learning  algorithms.  It  is  important  to  establish 
such  a  basis:  there  are  many  instances  [22,0]  in  the  machine  learning  literature 
of  people  doing  interesting  work  on  learning  agents,  but  reporting  the  results  in 
a  way  that  makes  it  difficult  to  compare  them  with  the  results  of  others. 

2  Acting  in  a  Complex  World 

An  autonomous  agent  can  be  seen  as  acting  in  a  world,  continually  executing 
a  function  that  maps  the  agent’s  perceptual  inputs  to  its  effector  outputs.  Its 
world,  or  environment,  is  everything  that  is  outside  the  agent  itself,  possibly 
including  other  robotic  agents  or  humans  The  agent  operates  in  a  cycle,  re¬ 
ceiving  an  input  from  the  world,  doing  some  computation,  then  generating  an 
output  that  afreets  the  world.  The  mapping  that  it  uses  may  have  state  or 
memory,  allowing  its  action  at  any  time  to  depend,  potentially,  on  the  entire 
stream  of  inputs  that  it  has  received  until  that  time.  Such  a  mapping  from  an 
input  stream  to  an  output  stream  is  rcfeired  to  as  a  behavior. 

In  order  to  study  the  effectiveness  of  particular  behaviors,  whether  or  not 
they  involve  learning,  we  must  model  the  connection  between  agent  and  world, 
understanding  how  an  agent’s  actions  affect  its  world  and,  hence,  its  own  input 
stream. 
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2.1  Modeling  an  Agent’s  Interaction  with  the  World 

The  world  can  be  modeled  as  a  deterministic  finite  automaton  whose  state 
transitions  depend  on  the  actions  of  an  agent.  This  model  will  be  extended  to 
include  non-deterministic  worlds  in  the  next  section.  A  world  can  be  formally 
modeled  as  the  triple  (5, .4,  IV),  in  which  5  is  the  set  of  possible  states  of  the 
world,  A  is  the  set  of  possible  outputs  from  the  agent  to  the  world  (or  actions 
that  can  be  performed  by  the  agent),  and  IV  is  the  state  transition  function, 
mapping  Sx  A  into  S.  Once  the  world  has  been  fixed,  the  agent  can  be  modeled 
as  the  4-tuple  (2,I,R,B)  where  I  is  the  set  of  possible  inputs  from  the  world 
to  the  agent,  I  is  a  mapping  from  S  to  1  that  determines  which  input  the.  agent 
will  receive  when  the  world  is  in  a  given  state,  R  is  the  reinforcement  function  of 
the  agent  that  maps  5  into  real  numbers  (it  may  also  be  useful  to  consider  more 
limited  models  in  which  the  output  of  the  reinforcement  function  is  Boolean¬ 
valued),  and  B  is  the  behavior  of  the  agent,  mapping  I*  (streams  of  inputs)  into 
A.  The  expressions  i(t)  and  a(t)  will  denote  the  input  received  by  the  agent  at 
time  t  and  the  action  taken  by  the  agent  at  time  t,  respectively. 

The  process  of  an  agent’s  interaction  with  the  world  is  depicted  in  Figure  1. 
The  world  is  in  some  internal  state,  s,  which  is  projected  into  i  and  r  by  the 
input  and  reinforcement  functions  /  and  R.  These  values  serve  as  inputs  to 
the  agent’s  behavior,  B,  which  generates  an  action  a  as  output.  Once  per 
synchronous  cycle  of  this  system,  the  value  of  a,  together  with  the  old  value  of 
world  state  s,  is  transformed  into  a  new  value  of  world  state  s  by  the  world’s 
transition  function  IV. 

Note  that  if  the  agent  does  not  have  a  simple  stimulus-response  behavior,  but 
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has  some  internal  state,  then  the  action  taken  by  the  behavior  can  be  a  function 
of  both  its  input  and  its  internal  slate.  This  internal  state  may  allow  the  agent 
to  discriminate  among  more  states  of  the  world  and,  hence,  to  obtain  higher 
reinforcement  values  by  performing  more  appropriate  actions.  To  simplify  the 
following  discussion,  actions  will  be  conditioned  only  on  the  input,  but  the 
treatment  is  easily  extended  to  the  case  in  which  the  action  depends  on  the 
agent’s  internal  state  as  well. 

2.2  Inconsistent  Worlds 

One  of  the  most  difficult  problems  that  a  learning  agent  must  contend  with  is 
inconsistency.  A  world  is  said  to  be  inconsistent  for  an  agent  if  it  is  possible  that, 
on  two  different  occasions  in  which  the  agent  receives  the  same  input  and  gen¬ 
erates  the  same  action,  the  next  states  of  the  world  differ  in  their  reinforcement 
or  the  world  changes  state  in  such  a  way  that  the  same  string  of  future  actions 
will  have  different  reinforcement  results.  There  are  many  different  phenomena 
that  can  account  for  inconsistency: 

•  The  agent  does  not  have  the  ability  to  discriminate  among  all  world  states. 
If  the  agent’s  input  function  /  is  not  one-to-one,  which  will  be  the  case 
in  general,  then  an  individual  input  could  have  arisen  from  many  world 
states.  When  some  of  those  states  respond  differently  to  different  actions, 
the  world  will  appear  inconsistent  to  the  agent. 

•  The  agent  has  “faulty”  sensors.  Some  percentage  of  the  time,  the  world 
is  in  a  state  s,  which  should  cause  the  agent  to  receive  7(s)  as  input, 
but  it  appears  that  the  world  is  in  some  other  state  s',  causing  the  agent 
to  receive  I(s')  as  input  instead.  Along  with  the  probability  of  error,  the 
nature  of  the  errors  must  lx-  specified:  are  the  erroneously  perceived  states 
chosen  maliciously,  or  according  to  some  distribution  over  the  state  space, 
or  contingently  upon  what  was  to  have  been  the  correct  input? 

•  The  agent  has  “faulty’'  effectors.  Some  percentage  of  the  time,  the  agent 
generates  action  a,  but  the  world  actually  changes  state  as  if  the  agent 
had  generated  a  different  action  a'.  As  above,  both  the  probability  and 
nature  of  the  errors  must  be  specified. 

•  The  world  has  a  probabilistic  transition  function.  In  this  case,  the  world 
is  a  stochastic  automaton  whose  transition  function,  W' ,  actually  maps 
S  x  A  into  a  probability  distribution  over  5  (a  mapping  from  S  into  the 
interval  [0, 1])  that  describes  the  probability  that  each  of  the  states  in  S 
will  be  the  next  state  of  the  world. 

Some  specific  cases  of  noise  phenomena  above  have  been  studied  in  the  formal 
function-learning  literature.  Valiant  [2-1]  has  explored  a  model  of  noise  in  which, 
with  some  small  probability,  the  entire  input  instance  to  the  agent,  can  be  chosen 


Figure  2:  Modeling  faulty  effectors  as  a  probrbilistic  world  transition  function. 


maliciously.  This  corresponds,  roughiy,  to  having  simultaneous  faults  in  sensing 
and  action  that  can  be  chosen  in  a  way  that  is  maximally  bad  for  the  learning 
algorithm.  This  model  is  overly  pessimistic  and  is  hard  to  justify  in  practical 
situations.  Angluin  (2]  works  with  a  model  of  noise  in  which  input  instances  are 
misclassified  with  some  probability;  that  is,  the  output  part  of  an  input-output 
pair  is  specified  incorrectly.  This  is  a  more  realistic  model  of  noise,  but  is  not 
directly  applicable  to  the  action-learning  problem  under  consideration  here. 

If  the  behavior  of  faulty  sensors  and  effectors  is  not  malicious,  each  of  the 
types  of  inconsistency  discussed  above  can  be  described  by  transforming  the 
original  world  model  into  one  in  which  the  set  of  world  states,  S,  is  idential  to 
the  set  of  agent  inputs,  J,  and  in  which  the  world  has  a  probabilistic  transition 
function.  Reducing  each  of  these  phenomena  to  probabilistic  world-transition 
functions  allows  the  rest  of  the  discussion  of  embedded  behaviors  to  ignore  the 
other  possible  modes  of  inconsistency.  The  remainder  of  this  section  shows  how 
to  transform  worlds  with  each  type  of  inconsistency  into  worlds  with  state  set 
1  and  probabilistic  transition  functions. 

Consider  an  agent,  embedded  in  a  world  with  deterministic  transition  func¬ 
tion  W,  whose  effectors  are  faulty  with  probability  p,  so  that  when  the  intended 
action  is  a,  the  actual  action  is  u(a).  This  agent’s  situation  can  be  described  by 
a  probabilistic  world  transition  function  W'(s,  a)  that  maps  the  value  of  W(s,  a) 
to  the  probability  value  1  -  p,  the  value  of  lV(s,  i/(a))  to  the  probability  value 
p  and  all  other  states  to  probability  value  0.  That  is, 

JF'(s,«)(W'(s,a))  =  1  -p 
lV'(s,u)(H'(s,i/(a))  =  p 

The  result  of  performing  action  a  in  state  s  will  be  VF(s,a)  with  probability  1  -p, 
and  W(s,i/(a))  with  probability  p.  Figure  2  depicts  this  transition  function. 
First,  a  deterministic  transition  is  made  based  on  the  action  of  the  agent;  then,  a 
probabilistic  transition  is  made  by  the  world.  This  model  can  be  easily  extended 
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Figure  3:  Modeling  faulty  sensors  with  multiple  probabilistic  transitions. 

if  v  is  a  mapping  from  actions  to  probability  distributions  over  actions.  In  that 
case,  for  all  a'  not  equal  to  a,  the  value  of  VF(s,a')  is  mapped  to  the  probability 
value  p  i/(a)(a'),  which  is  the  probability  of  an  error,  p,  times  the  probability  that 
action  a'  will  be  executed  given  that  the  agent  intended  to  execute  the  action 
a.  The  value  of  !V(s,a)  is  mapped  to  the  probability  value  1  -  p  +  p  »/(a)(a), 
which  is  the  probability  that  there  is  no  error,  plus  the  probability  that  the 
error  actually  maps  back  to  the  correct  action. 

Faulty  input  sensors  are  somewhat  more  difficult  to  model.  Let  the  agent’s 
sensors  be  faulty  with  probability  p,  yielding  a  value  /(t'(s))  rather  than  I(s). 
It  is  possible  to  construct  a  new  model  with  a  probabilistic  world-transition 
function  in  which  the  states  of  the  world  are  those  that  the  agent  thinks  it  is 
in.  The  model  can  be  most  simply  viewed  if  the  world  makes  more  than  one 
probabilistic  transition,  as  shown  in  Figure  3.  If  it  appears  that  the  world  is  in 
state  s,  then  with  probability  p,,  it  actually  is,  and  the  first  transition  is  to  the 
same  state.  The  rest  of  the  probability  mass  is  distributed  over  the  other  states 
in  the  inverse  image  of  s  under  u,  j/_1(s),  causing  a  transition  to  some  world  state 
s'  with  probability  p,<.  Next,  there  is  a  transition  to  a  new  state  on  the  basis  of 
the  agent’s  action  according  to  the  original  transition  function  W .  Finally,  with 
probability  p,  the  world  makes  a  transition  to  the  state  t/(lV(s',a)),  allowing  for 
the  chance  that  this  result  will  be  misperceived  on  the  next  tick.  In  Figure  4, 
this  diagram  is  converted  into  a  more  standard  one,  in  which  the  agent  performs 
an  action,  then  the  world  makes  a  probabilistic  transition.  This  construction 
can  also  be  extended  to  the  cases  in  which  t'(s)  is  a  probability  distribution  over 
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Figure  4:  Modeling  faulty  sensors  as  a  probabilistic  world  transition  function. 


Figure  5:  Modeling  inability  to  discriminate  among  worlds. 


S  and  in  which  the  initial  world-transition  function  is  probabilistic. 

To  model  an  agent’s  inability  to  discriminate  among  worlds,  it  is  possible  to 
construct  a  new  model  of  the  world  in  which  the  elements  of  1  are  the  states, 
standing  for  equivalence  classes  of  the  states  in  the  old  model.  Let  {si,  ...,s„} 
be  the  inverse  image  of  i  under  I.  There  is  a  probabilistic  transition  to  each 
of  the  Sj,  based  on  the  probability,  pj,  that  the  world  is  in  state  s;-  given  that 
the  agent  received  the  input  i.  From  each  of  these  states,  the  world  makes  a 
transition  on  the  basis  of  the  agent’s  action,  a,  to  the  state  lV(sj,a),  which 
is  finally  mapped  back  down  to  the  new  state  space  by  the  function  I.  This 
process  is  depicted  in  Figure  5  and  the  resulting  transition  function  is  shown  in 
Figure  G. 

In  the  construction  for  faulty  sensors,  it  is  necessary  to  evaluate  the  proba¬ 
bility  that  the  world  is  in  some  state  s*,  given  that  it  appears  to  the  agent  to 
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Figure  6:  Modeling  inability  to  discriminate  among  worlds  as  a  probabilistic 
world  transition  function. 


be  in  another  state  s.  This  probability  depends  on  the  unconditional  probabil¬ 
ity  that  the  world  is  in  the  state  st,  as  well  as  the  unconditional  probability 
that  the  world  appears  to  be  in  the  state  s.  These  unconditional  probabilities 
depend,  in  the  general  case,  on  the  behavior  that  the  agent  is  executing,  so 
the  construction  cannot  be  carried  out  before  the  behavior  is  fixed.  A  similar 
problem  exists  for  the  case  of  lack  of  discrimination:  it  is  necessary  to  evaluate 
the  probability  that  the  world  is  in  each  of  the  individual  states  in  the  inverse 
image  of  input  i  under  7  given  that  the  agent  has  input  i.  These  probabilities 
also  depend  on  the  behavior  that  is  being  executed  by  the  agent.  This  leads  to 
a  very  complex  optimization  problem  that  is,  in  its  general  form,  beyond  the 
scope  of  this  paper. 

The  rest  of  the  paper  will  be  concerned  only  with  worlds  that  are  globally 
consistent  for  the  learning  agent.  A  world  is  globally  consistent  for  an  agent  if 
and  only  if  for  all  inputs  i  6  I  and  actions  a  €  A,  the  expected  value  of  the 
reinforcement  given  i  and  a  is  constant.  Global  consistency  allows  for  variations 
in  the  result  of  performing  an  action  in  a  situation,  as  long  as  the  expected, 
or  average,  result  is  the  same.  It  simply  requires  that  there  not  be  variations 
in  the  world  that  are  undetectable  by  the  agent  and  that  affect  its  choice  of 
action.  If  the  transformation  described  above  has  been  carried  out  so  that  the 
sets  I  and  S  are  the  same,  this  is  tantamount  to  requiring  that  the  world  be  a 
Markov  decision  process  with  stationary  transition  and  output  probabilies  [11]. 
In  addition,  the  following  discussion  will  ;tssumc  that  the  world  is  consistent 
over  changes  in  the  agent’s  behavior. 

2.3  Learning  Behaviors 

The  problem  of  programming  an  agent  to  behave  correctly  in  a  world  is  to  choose 
some  behavior  Id,  given  that  the  rest  of  the  parameters  of  the  agent  and  world 
are  fixed.  If  the  programmer  does  not  know  everything  about  the  world,  or  if  he 
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Figure  7:  General  algorithm  for  learning  behaviors. 

wishes  the  agent  he  is  designing  to  be  able  to  operate  in  a  number  of  different 
worlds,  he  must  program  an  agent  that  will  learn  to  behave  correctly.  That  is, 
be  must  find  a  behavior  B'  that,  through  changing  parts  of  its  internal  state  on 
the  basis  of  its  perceptual  stream,  eventually  converges  to  some  behavior  B" 
that  is  correct  for  the  world  that  gave  rise  to  its  perceptions.  Of  course,  to  say 
that  a  program  learns  is  just  to  take  a  particular  perspective  on  a  program  with 
internal  state.  A  behavior  with  state  can  be  seen  as  “learning’’  if  parts  of  its 
state  eventually  converge  to  some  fixed  or  slowly-varying  values.  The  behavior 
that  results  from  those  parameters  having  been  fixed  in  that  way  can  be  called 
the  “learned  behavior.” 

A  learning  behavior  is  an  algorithm  that  learns  an  appropriate  behavior 
for  an  agent  in  a  world.  It  is  itself  a  behavior,  mapping  elements  of  I  to 
elements  of  A,  but  it  requires  the  additional  input  R(s)  for  every  state  s,  in 
order  to  know  the  reinforcement  value  of  the  state  for  the  agent.  A  learning 
behavior  consists  of  three  parts:  an  initial  state  Sq,  an  update  function  u,  and 
an  evaluation  function  e.  At  any  moment,  the  internal  state  encodes  whatever 
information  the  learner  has  chosen  to  save  about  its  interactions  with  the  world.. 
The  update  function  maps  an  internal  state  of  the  learner,  an  input,  an  action, 
and  a  reinforcement  value  into  a  new  internal  state,  adjusting  the  current  state 
based  on  the  reinforcement  resulting  from  performing  that  action  in  that  input 
situation.  The  evaluation  function  maps  an  internal  state  and  an  input  into 
an  action,  choosing  the  action  that  seems  most  useful  for  the  agent  in  that 
situation,  based  on  the  information  about  the  world  stored  in  the  internal  state. 
Recall  that  an  action  can  be  useful  for  an  agent  either  because  it  has  a  high 
reinforcement  value  or  because  the  agent  knows  little  about  its  outcome. 

A  general  algorithm  for  learning  behaviors,  based  on  these  three  components, 
is  shown  in  Figure  7.  The  internal  state  is  initialized  to  sq,  then  the  algorithm 
loops  forever.  An  input  is  read  from  the  world  and  the  evaluation  function 
is  applied  to  the  internal  state  and  the  input,  resulting  in  an  action,  which  is 
then  output.  At  this  point.,  the  world  changes  to  a  new  state.  The  program 
next  determines  the  reinforcement  associated  with  the  new  situation,  uses  that 
information,  together  with  the  last  input  and  action,  to  update  the  internal 
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state,  then  goes  back  to  the  top  of  its  loop.  Formulating  learning  behaviors  in 
terms  of  sq,  c ,  and  u  facilitates  building  experimental  frameworks  that  allow 
testing  of  dilTercnt  learning  behaviors  in  many  different  worlds. 

3  Performance  Criteria 

In  order  to  compare  algorithms  for  learning  behaviors,  we  must  fix  the  criteria  on 
which  they  are  to  be  judged.  There  arc  three  major  considerations:  correctness, 
convergence,  and  time-space  complexity.  First,  we  must  determine  the  correct 
behavior  for  an  agent  in  a  domain.  Then  we  can  measure  to  what  degree  a 
learned  behavior  approximates  the  correct  behavior  and  the  speed,  in  terms  of 
the  number  of  interactions  with  the  world,  with  which  it  converges.  We  must 
also  be  concerned  with  the  amount  of  time  and  space  needed  for  computing  the 
update  and  evaluation  functions  and  with  the  size  of  the  internal  state  of  the 
algorithm. 

As  well  as  comparing  the  performance  of  different  algorithms  for  a  partic¬ 
ular  world,  it  is  useful  to  study  the  way  different  performance  measures  of  an 
algorithm  vary  as  a  function  of  independent  variables  that  characterize  a  world. 
Such  independent  variables  might  include:  the  sizes  of  I  and  A  and  the  val¬ 
ues  of  the  performance  measures  on  the  random  algorithm  (one  that  chooses 
among  the  available  actions  randomly  at  each  time  step).  These  kinds  of  com¬ 
parisons  are  not  pursued  further  in  this  paper,  but  are  enabled  once  objective 
performance  criteria  are  chosen. 

3.1  Correctness 

When  shall  we  say  that  a  behavior  is  correct  for  an  agent  in  an  environment? 
There  are  many  possible  answers  that  will  lead  to  different  learning  algorithms 
and  analyses.  An  important  quantity  is  the  expected  reinforcement  that  the 
agent  will  receive  in  the  next  instant,  given  that  the  current  input  is  i(t)  and 
the  current  action  is  «(/),  which  can  be  expressed  as 

e>m,a(t))  =  E(R(i(t  +  l))\i(t)At)) 

=  0(<))(O  • 

•'el 

It  is  the  sum,  over  all  possible  next  states,  of  the  probability  that  the  world  will 
make  a  transition  to  that  state  times  its  reinforcement  value.  This  formulation 
assumes  that  the  inputs  directly  correspond  to  the  states  of  the  world  and  that 
W'  is  a  probabilistic  transition  function.  If  the  world  is  globally  consistent 
for  the  agent,  the  process  is  Markov  and  the  times  are  irrelevant  in  the  above 
definition,  allowing  it  to  be  restated  as 

er(?»  =  rt('')W(».a)(0  • 

i'ei 
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One  of  tiie  simplest  criteria  is  tiiat  a  behavior  is  correct  if,  at  each  step,  it 
docs  the  action  that  is  expected  to  cause  the  most  reinforcement  to  be  received 
on  the  next  step.  A  correct  behavior,  in  this  case,  is  one  that  generates  actions 
that  are  optimal  under  the  following  definition: 

Vi  €  2,o  €  A.Opt(/,fl)  — *  Va'  €  A.er(i,a)  >  er(i,a')  . 

Optimal  behavior  is  defined  as  a  relation  on  inputs  and  actions  rather  than  as  a 
function,  because  there  may  be  many  actions  that  are  equally  good  for  a  given 
input.  However,  it  can  be  made  into  a  function  by  breaking  ties  arbitrarily. 
This  is  a  local  criterion  that  may  cause  the  agent  to  sacrifice  promises  of  future 
reinforcement  for  immediately  attainable  current  reinforcement. 

The  concept  of  expected  reinforcement  can  be  made  more  global  by  consid¬ 
ering  the  total  expected  reinforcement  for  a  finite  future  interval,  or  horizon, 
given  that  an  action  was  taken  in  a  particular  input  situation.  This  is  often 
termed  the  value  of  an  action,  and  it  is  computed  with  respect  to  a  particular 
behavior  (because  the  value  of  the  next  action  taken  depends  crucially  on  how 
the  agent  will  behave  after  that).  In  the  following,  expected  reinforcement  is 
computed  under  the  assumption  that  the  agent  will  act  optimally  the  rest  of  the 
time.  The  expected  reinforcement,  with  horizon  k,  of  doing  action  a  in  input 
situation  i  at  time  t  is  defined  as 

k 

erk(i(t),a(t))  =  £(£  E(i(<+j))  |  i(l),a(l),Vli  <  k.  Optt_fc(*(«  +  /»),a(<  +  /i)))  . 
;= 1 

This  expression  can  be  simplified  to  a  recursive,  lime-independent  formulation, 
in  which  the  fr-step  value  of  an  action  in  a  state  is  just  the  one-step  value  of  the 
action  in  the  state  plus  the  k— 1  -step  value  of  the  optimal  action  in  the  following 
state: 

erk(i,l)  =  er(i,a)+  ^  M/'(i,a)(i')cr|;_i(i/,Opti_j(i')) . 
i'tt 

This  definition  is  recursively  dependent  on  the  definition  of  optimality  k  steps 
into  the  future,  Opt*: 

Vi  €  2,a  £  A.  Optt(i,  a)  Va'  £  A.  erk(i,a)  >  er*(i,a')  . 

The  values  of  cri  and  Optj  are  just  er  and  Opt  given  above.  The  fc-step  value 
of  action  a  in  situation  i  at  time  t,  c»'i(i,a),  can  be  computed  by  dynamic 
programming.  First,  the  Optj  relation  is  computed;  this  allows  the  er2  function 
to  be  calculated  for  all  i  and  a.  Proceeding  for  k  steps  will  generate  the  value 
for  er*.  Because  of  the  assumption  that  the  world  is  Markov,  these  values  are 
not  dependent  on  the  time.  However,  if  k  is  large,  the  computational  expense 
of  this  method  is  prohibitive. 

Another  way  to  define  global  optimality  is  to  consider  an  infinite  sum  of  fu¬ 
ture  reinforcement  values  in  which  more  recent  values  are  weighted  more  heavily 
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than  older  values.  This  is  referred  to  as  a  discounted  sum,  depending  on  the  pa¬ 
rameter  7  to  specify  the  rate  of  discounting.  Expected  discounted  reinforcement 
at  time  t  is  defined  as 

er,(f(0,o(0)  =  «'(<  +  >))  I  #(0.  0pt7(»(<  +  h),a(t  +  h)))  . 

Properties  of  the  exponential  allow  us  to  reduce  this  expression  to 

er(i(0,  a(0)  +  7ery(i(l  +  1),  a(/  +  1))  , 

which  can  be  expressed  independent  of  time  as 

ery(i,a)  =  er(i,a)  +  7^2  tvV>a)(*>)*ry(^>Opt7(i>))  ■ 

•'ex 

The  related  definition  of  7-discounted  optimality  is  given  by 

Vi  G  T,o  €  A.Opty(i,a)  «-»  Va'  G  Aer7(i,a)  >  er7(i,a')  . 

For  a  given  value  of  7  and  a  proposed  definition  of  0pt7,  er7  can  be  found 
by  solving  a  system  of  equations,  one  for  each  possible  instantiation  of  its  ar¬ 
guments.  A  dynamic  programming  method  called  policy  iteration  [20]  can  be 
used  in  conjunction  with  that  solution  method  to  adjust  policy  0pt7  until  it  is 
truly  the  optimal  behavior.  This  definition  of  optimality  is  more  widely  used 
than  finite-horizon  optimality  because  its  exponential  form  makes  it  more  com¬ 
putationally  tractable.  It  is  also  an  intuitively  satisfying  model,  with  slowly 
diminishing  importance  attached  to  events  in  the  distant  future. 

As  an  illustration  of  these  different  measures  of  optimality,  consider  the  world 
depicted  in  Figure  8.  In  state  0,  the  agent  has  a  choice  as  to  whether  to  go  right 
or  left;  in  all  other  states  the  world  transition  is  the  same  no  matter  what  the 
agent  does.  In  the  left  loop,  the  only  reinforcement  comes  at  the  last  state  before 
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Figure  9:  Plot  of  expected  return  against  horizon  k.  Solid  line  indicates  strategy 
of  going  left  first,  then  behaving  optimally.  Dashed  line  indicates  strategy  of 
going  right  first,  then  behaving  optimally. 


state  0,  but  it  has  value  G.  In  the  right  loop,  each  state  has  reinforcement  value 
1.  Thus,  the  average  reinforcement  is  higher  around  the  left  loop,  but  it  comes 
sooner  around  the  right  loop.  The  agent  must  decide  what  action  to  take  in 
state  0.  Different  definitions  of  optimality  lead  to  different  choices  of  optimal 
action. 

Under  the  local  definition  of  optimality,  we  have  er(Q,  L)  =  0  and  er(0,  R)  = 
1.  The  expected  return  of  going  left  is  0  and  of  going  right  is  1,  so  the  optimal 
action  would  be  to  go  right. 

Using  the  finite  horizon  definition  of  optimality,  which  action  is  optimal 
depends  on  the  horizon.  For  very  short  horizons,  it  is  clearly  better  to  go  right. 
When  the  horizon,  k,  is  5,  it  becomes  better  to  go  left.  A  general  rule  for  optimal 
behavior  is  that  when  in  state  0,  if  the  horizon  is  5  or  more,  go  left,  otherwise  go 
right.  Figure  9  shows  a  plot  of  the  values  of  going  left  (solid  line)  and  going  right 
(dashed  line)  initially,  assuming  that  all  choices  are  made  optimally  thereafter. 
We  can  see  that  going  right  is  initially  best,  but  it  is  dominated  by  going  left 
for  all  k  >  5. 

Finally,  we  can  consider  discounted  expected  .due.  Figure  10  shows  a  plot 
of  the  values  of  the  strategies  of  always  going  left  at  state  0  (solid  line)  and 
always  going  right  at  state  0  (dashed  line)  plotted  as  a  function  of  7..  When 
there  is  a  great  deal  of  discounting  (7  is  small),  it  is  best  to  go  right  because 
the  reward  happens  sooner.  As  7  increases,  going  left  becomes  better,  and  at 
approximately  7  =  9.15,  going  left  dominates  going  right. 

One  way  to  design  learning  behaviors  that  have  these  difficult  kinds  of  global 
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Figure  10:  Plot  of  expected  return  against  discount  factor  y.  Solid  line  indicates 
strategy  of  always  going  left.  Dashed  line  indicates  strategy  of  always  going 
right. 

optimality  is  to  divide  the  problem  into  two  parts:  transducing  the  global  rein¬ 
forcement  signal  into  a  local  reinforcement  signal  and  learning  to  perform  the 
locally  best  action.  The  global  reinforcement  signal  is  the  stream  of  values  of 
that  come  from  the  environment.  The  optimal  local  reinforcement  sig¬ 
nal,  R(i(t)),  can  be  defined  as  iZ(i(f))  +  7er7(t(f),Opt7(i(t)).  It  is  the  value 
of  the  state  «(<)  assuming  that  the  agent  acts  optimally.  As  shown  by  Sutton 
[22],  this  signal  can  be  approximated  by  the  value  of  the  state  i(t)  given  that 
the  agent  acts  how  it  is  currently  acting.  Sutton's  temporal  difference  (TD) 
algorithm  provides  a  way  of  learning  to  generate  the  local  reinforcement  signal 
from  the  global  reinforcement  signal  in  such  a  way  that,  if  <  ombined  with  a 
correct  local  learning  algorithm,  it  will  converge  to  the  true  optimal  local  re¬ 
inforcement  values  [22,23].  A  complication  introduced  by  this  method  is  that, 
from  the  local  behavior-learner's  point  of  view,  the  world  is  not  stationary.  This 
is  because  it  takes  time  for  the  TD  algorithm  to  converge  and  because  changes 
in  the  behavior  cause  changes  in  the  values  of  states  and  therefore  in  the  local 
reinforcement  function. 

The  following  discussion  will  be  in  terms  of  some  definition  of  the  optimality 
of  an  action  for  a  situation,  Opt(i,n),  which  can  be  defined  in  any  of  the  three 
ways  above,  or  in  some  novel  way  that  is  more  appropriate  for  the  domain  in 
which  a  particular  agent  is  working. 
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3.2  Convergence 

Correctness  is  a  binary  criterion:  either  a  behavior  is  or  is  not  correct  for  its 
world.  Since  correctness  requires  that  the  behavior  perform  the  optimal  actions 
from  the  outset,  it  is  unlikely  that  any  “learning”  behavior  will  ever  be  correct. 
Using  a  definition  of  correctness  as  a  reference,  however,  it  is  possible  to  develop 
other  measures  of  how  close  particular  behaviors  come  to  the  optima!  behavior. 

'  *,s  section  will  consider  two  different  classes  of  methods  for  characterizing  how 

„  •.  4  or  useful  a  behavior  is  in  terms  of  its  relation  to  the  optim. '  behavior. 

3.2  1  Classical  Convergence  Measures 

Early  work  in  the  theory  of  machine  learning  [5,7]  was  largely  concerned  with 
learning  in  the  limit.  Researchers  were  interested  in  characterizing  whether  or 
not  a  learning  strategy  would  converge  to  the  correct  behavior  in  the  limit.  A 
behavior  converges  to  the  optimal  behavior  in  the  limit  if  there  is  some  time 
after  which  every  action  taken  by  the  behavior  is  the  same  as  the  action  that 
would  have  been  taken  by  the  optimal  behavior.  Work  in  learning-automata 
theory  has  relaxed  the  requirements  of  learning  in  the  limit  by  applying  different 
definitions  of  probabilistic  convergence  to  the  sequence  of  internal  states  of  a 
learning  automaton  [18]. 

An  important  recent  development  in  this  area  is  a  model  of  Boolean-function 
learning  algorithms  that  are  probably  approximately  correct  (PAC)  [24,2],  teat 
is,  that  have  a  high  probability  of  converging  to  a  function  that  closely  approx¬ 
imates  the  optimal  function.  The  correctness  of  a  function  is  measured  with 
respect  to  a  fixed  probability  distribution  on  the  input  instances — a  function  is 
said  to  approximate  another  function  to  degree  f  if  the  probability  that  they  will 
disagree  on  any  instance  chosen  according  to  the  given  probability  distribution 
is  less  than  c.  This  model  requires  that  there  be  a  fixed  distribution  over  the 
input  instances  and  that  each  input  to  the  algorithm  be  drawn  according  to 
that  distribution. 

For  an  agent  to  act  effectively  in  the  world,  its  inputs  must  provide  some 
information  about  the  state  that  the  world  is  in.  In  general,  when  the  agent 
performs  an  action  it  will  bring  about  a  change  in  the  state  of  the  world  and, 
hence,  a  change  in  the  information  the  agent  receives  about  the  world.  Thus, 
it  will  be  very  unlikely  that  such  an  agent’s  inputs  could  be  modeled  as  be¬ 
ing  drawn  from  a  fixed  distribution,  making  PAC-convergence  an  inappropriate 
model  for  autonomous  agents. 

In  addition,  the  PAC-learning  model  is  distribution-independent — it  seeks 
to  make  statements  about  the  performance  of  algorithms  no  matter  how  the 
input  instances  are  distributed.  As  Buntine  has  pointed  out  [6],  its  predictions 
are  often  overly  conservative  for  situations  in  which  there  is  a  priori  information 
about,  the  distribution  of  the  input  instances,  or  in  which  something  is  known 
about  the  actual  sample,  such  as  how  many  distinct  elements  it  contains. 


3.2.2  Measuring  Error  over  an  Agent’s  Lifetime 

None  of  the  classical  convergence  measures  take  into  account  the  behavior  of  the 
agent  during  the  period  in  which  it  converges.  Instead,  they  make  what  is,  for 
an  agent  acting  in  the  world,  an  artificial  distinction  between  a  learning  phase 
and  an  acting  phase.  Autonomous  agents  that  have  extended  run  times  will 
be  expected  to  learn  for  their  entire  lifetime.  Because  they  may  not  encounter 
certain  parts  or  aspects  of  their  environments  until  arbitrarily  late  in  the  run, 
it  is  inappropriate  to  require  mistakes  to  be  made  before  some  fixed  deadline. 

Another  way  of  characterizing  the  performance  of  a  function-learning  algo¬ 
rithm  is  to  count  the  divergences  it  makes  from  the  optimal  function.  Little- 
stone  [12]  has  investigated  this  model  extensively,  characterizing  the  optimal 
number  of  ‘mistakes’  for  a  Boolean-function  learner  and  presenting  algorithms 
that  perform  very  well  on  certain  classes  of  Boolean  functions.  This  model 
is  intuitively  pleasing,  making  no  restrictive  division  into  learning  and  acting 
phases,  but  it  is  not  presented  as  being  suited  to  noisy  or  inconsistent  domains. 
However,  by  assimilating  the  inconsist  of  the  domain  into  the  definition 
of  the  target  function,  as  in  the  requirement  for  optimal  behavior,  ,'pt,  we  can 
make  use  of  mistake  bounds  in  inconsistent  domains.  A  behavior  is  said  to  make 
an  avoidable  mistake  if,  given  some  input  instance  i,  it  generates  action  a  and 
Opt(t,a)  does  not  hold;  that  is,  there  was  some  other  action  that  would  have 
had  a  higher  expected  reinforcement. 

Avoidable  mistake  bounds  take  into  account  the  fact  that  many  mistakes 
cannot  be  avoided  by  an  agent  with  limited  sensory  abilities  and  unreliable 
effectors.  However,  that  measure  is  not  entirely  appropriate,  because  every  non- 
optimal  choice  of  action  is  considered  to  be  a  mistake  of  the  same  magnitude. 
The  expected  error  of  an  action  a  given  an  input  i,  err(n,i),  is  defined  to  be 

errfa,  i)  =  er(a',  i)  -  er(a,  i)  , 

in  which  a'  is  any  action  such  that  opt(a',i).  The  expected  error  associated  with 
an  optimal  action  is  0;  for  anon-optimal  action,  it  is  just  the  decrease  in  expected 
reinforcement  due  to  having  executed  that  action  rather  than  an  optimal  one. 
The  error  of  a  behavior,  either  in  the  limit,  or  for  runs  of  finite  length,  can 
be  measured  by  summing  the  errors  of  the  actions  it  generates.  This  value, 
referred  to  in  the  statistics  literature  as  the  regret  of  a  strategy  [4],  represents 
the  expected  amount  of  reinforcement  lost  due  to  executing  this  behavior  rather 
than  an  optimal  one.  Thi  is  an  appropriate  performance  metric  for  agents 
embedded  in  inconsistent  environments  because  it  measures  expected  loss  of 
reinforcement,  which  is  picciscly  what  we  would  like  to  minimize  in  our  agents. 

’  many  situations,  the  optimal  behavior  is  unknown  or  difficult  to  compute, 
w  makes  it  difficult  to  calculate  the  error  of  a  given  behavior.  It  is  still 
pos. ...  ..  to  use  this  measure  to  compare  two  different  behaviors  for  the  same 
agent  and  environment.  The  expected  reinforcement  for  an  algorithm  over  some 
time  period  can  be  estimated  by  running  it  several  times  and  averaging  the 
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resulting  total  reinforcements.  Decause  expectations  are  additive,  the  difference 
between  the  expected  error  of  two  algorithms  is  the  same  as  the  difference 
between  their  expected  total  reinforcement  values.  Thus,  the  difference  between 
average  reinforcements  is  a  valid  measure  of  comparison  that  provides  a  measure 
of  a  behavior’s  correctness  that  is  independent  of  the  internal  architecture  of 
the  algorithm  and  that  can  be  used  to  compare  results  across  a  wide  variety  of 
techniques. 

3.3  Time  and  Space  Complexity 

Autonomous  agents  must  operate  in  the  real  world,  continually  receiving  inputs 
from  and  performing  actions  on  their  environment.  Because  the  world  changes 
dynamically,  an  autonomous  agent  must  be  reactive — always  aware  of  and  react¬ 
ing  to  chang'T  in  its  environment.  To  ensure  reactivity,  an  agent  must  operate 
in  real-time ;  that  is,  its  sense-compute-act  cycle  must  keep  pace  with  the  un¬ 
folding  of  important  events  in  the  environment.  The  exact  constraints  on  the 
reaction  time  of  an  agent  arc  often  difficult  to  articulate,  but  it  is  clear  that,  in 
general,  unbounded  computation  must  never  take  place. 

A  convenient  way  to  guarantee  real-time  performance  is  to  require  that  the 
behavior  spend  only  a  constant  amount  of  time,  referred  to  as  a  ‘tick,’  generating 
an  action  in  response  to  each  input.  If  the  behavior  is  a  learning  behavior, 
the  learning  process  must  also  spend  only  a  constant  amount  of  time  on  each 
input  instance.  There  arc  two  strategies  for  designing  such  a  learning  system: 
incremental  and  batch. 

An  incremental  system  processes  each  new  data  set  or  learning  instance  as 
it  arrives  as  input.  The  processing  must  be  efficient  enough  that  the  system 
is  always  ready  for  new  data  when  it  arrives.  If  new  relevant  data  can  arrive 
every  tick,  the  learning  algorithm  must  spend  only  one  constant  tick’s  worth  of 
time  on  each  instance.  The  requirement  for  incrcmentality  can,  theoretically, 
be  relaxed  to  yield  a  batch  system,  in  which  a  number  of  learning  instances  are 
collected,  then  processed  for  many  ticks.  As  long  as  the  learning  system  adheres 
to  the  tick  discipline,  this  process  need  not  interfere  with  the  reactiveness  of  the 
rest  of  the  system.  Working  in  batch  mode  may  limit  the  usefulness  of  the 
learning  system  to  some  degree,  however,  because  the  system  will  be  working 
with  old  data  that  may  not  reflect  the  current  situation  and  it  will  force  the 
data  that  arrives  during  the  computation  phase  to  be  ignored.  When  using  this 
method,  the  input  data  must  be  sampled  with  care,  in  order  to  avoid  statistical 
distributions  of  inputs  that  do  not  reflect  those  of  the  external  world. 

An  algorithm  can  be  said  to  be  strictly  incremental  if  it  uses  a  bounded 
amount  of  time  and  space  throughout  its  entire  lifetime.  This  is  in  contrast 
with  such  approaches  as  Kiblcr  and  Aim’s  instance-based  learning  [1],  which 
is  incremental  in  that  it  processes  one  instance  at  a  tim<\  but  is  not  strictly 
incremental  because  instances  are  stored  in  a  memory  whose  size  may  increase 
without  bound.  For  an  incremental  system  that  processes  one  instance  per  tick 
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to  perform  in  real  time,  it  must  be  strictly  incremental. 

The  amount  of  time  an  incremental  behavior  spends  on  each  input  should 
not  vary  as  a  function  of  the  number  of  inputs  that  have  been  received.  It 
will,  however,  depend  on  the  size  of  the  input  and  the  output,  but  that  is  fixed 
at  design  time.  This  allows  the  programmer  to  know  how  long  each  tick  of 
the  learning  behavior  will  take  to  compute  on  the  available  hardware  and  to 
compare  that  rate  with  the  pace  of  events  in  the  world.  Any  formalization  of 
the  interaction  between  an  agent  and  its  world  will  depend  on  the  rate  of  the 
interaction;  behaviors  that  work  at  rates  different  from  the  chosen  one  will  es¬ 
sentially  be  working  in  a  different  environment.  The  expected  values  of  optimal 
behaviors  for  different  reaction  rates  will  be  quite  different.  In  general,  up  to 
some  minimum  value,  the  faster  an  agent  can  interact  with  the  world,  the  better 
(otherwise  the  agent  does  not  have  time  to  avert  impending  bad  events),  so  we 
should  strive  for  the  most  efficient  algorithms  possible,  though  a  slow  algorithm 
with  better  convergence  properties  might  be  preferable  to  a  fast  algorithm  that 
was  far  from  optimal. 

Complex  agents,  such  as  mobile  robots,  with  a  wide  variety  of  sensors  and 
effectors  will  have  a  huge  number  of  possible  inputs  and  outputs.  If  algorithms 
for  these  agents  are  to  be  practical,  they  must  have  time  and  space  complexity 
that  is  polynomial  in  the  number  of  input  bits,  lg(|  1 1),  and  the  number  of 
output  bits,  lg(|  A  |),  rather  than  the  the  number  of  inputs  and  outputs.  This 
will  probably  only  be  achievable  by  limiting  the  class  of  behaviors  that  can  be 
learned  by  the  agent. 

4  Related  Work 

The  problem  of  learning  the  structure  of  a  finite-state  automaton  from  examples 
has  been  studied  by  many  theoreticians,  including  Moore  (17),  Gold  [8]  and, 
more  recently,  Rivest  and  Schapire  [19].  This  is  a  very  difficult  problem  that  has 
only  been  studied  in  the  case  of  deterministic  automata.  If  the  entire  structure 
of  the  world  can  be  learned,  it  is  conceptually  straightforward  to  compute  the 
optimal  behavior.  It  is  important  to  note,  however,  that  learning  an  action- 
map  that  maximizes  reinforcement  is  not  necessarily  as  complex  as  learning  the 
world’s  transition  function. 

A  number  of  different  groups  of  researchers  have  considered  the  problem 
of  designing  algorithms  for  reinforcement  learning  and,  in  the  process,  have 
addressed  the  issue  of  measures  for  performance  of  reinforcement  learning  algo¬ 
rithms. 

Statisticians  have  studied  reinforcement  learning  in  the  guise  of  k-armtd  ban¬ 
dit  problems,  in  which  the  agent  has  k  possible  actions  and  only  reinforcement 
as  input  [4].  The  work  has  primarily  concerned  the  existence  of  optimal  strate¬ 
gies  given  various  kinds  of  a  prion  information  about  the  possible  distributions 
of  the  payoffs  of  the  individual  arms.  The  notion  of  regret  was  developed  in  the 
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context  of  choosing  the  optimal  behavior  in  the  minimax  setting,  in  which  the 
worst  is  assumed  about  the  world.  These  strategies  are,  in  general,  computa¬ 
tionally  intractable  and  require,  except  in  the  minimax  case,  information  that 
is  unavailable  in  the  current  setting  of  the  problem. 

More  appropriate  to  agents  that  must  learn  to  behave  in  the  world  is  the 
work  of  researchers  in  the  field  of  learning  automata  [18].  They  classify  their 
algorithms  according  to  whether  they  are  expedient  (better  than  the  random 
strategy),  optimal,  or  c-optimal  (some  parameter  can  be  chosen  to  make  the  be¬ 
havior  arbitrarily  “close”  to  optimal).  In  addition,  there  are  methods  for  char¬ 
acterizing  the  convergence  rate  for  some  learning-automata  algorithms.  These 
evaluation  methods  are  tailored  for  the  case  in  which  the  learning  behavior’s 
only  internal  state  is  a  vector  of  probabilities,  one  for  each  possible  action,  that 
characterize  the  probability  of  the  agent  performing  that  action.  Also,  no  con¬ 
sideration  is  made  of  the  effectiveness  of  the  algorithm  during  the  time  before 
it  converges. 

Watkins  [27]  presents  a  very  clear  discussion  of  different  types  of  optimality 
from  an  operations-research  perspective  and  characterizes  possible  algorithms 
for  learning  optimal  behavior  from  delayed  rewards.  Williams  [28]  presents 
a  theoretical  view  of  a  connectionist  reinforcement-learning  algorithm  [3]  as 
a  form  of  gradient  search.  Sutton  [22,23]  shows  how  to  divide  the  problem 
of  learning  from  delayed  reinforcement  into  the  problems  of  locally  optimal 
behavior  learning  and  secondary  reinforcement-signal  learning. 

5  Conclusion 

This  paper  has  studied  the  problem  of  building  agents  that  learn  about  acting 
in  complex,  inconsistent  environments.  It  has  established  local  and  global  defi¬ 
nitions  of  optimality  of  behaviors  in  non-deterministic  worlds  and  has  provided 
an  implementation-independent  measure  of  deviation  from  the  optimal.  This 
framework  for  the  comparison  of  algorithms  will  allow  researchers  to  develop 
new  algorithms  and  compare  them  rigorously  to  one  another. 

A  particularly  interesting  direction  to  pursue  is  how  to  make  the  algorithms 
more  efficient  in  time  and  space  and  closer  to  optimal  in  behavior  by  making 
assumptions  about  the  environment.  Examples  of  this  are  Van  de  Velde’s  work 
on  learning  to  optimize  usefulness  of  results  rather  than  their  correctness  [26] 
and  Russell’s  use  of  determinations  [21].  The  only  hope  for  the  machine  learning 
enterprise  is  that  there  are  regularities  in  the  world  that  will  make  efficient 
learning  possible. 
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Abstract 

An  agent  that  must  learn  to  act  in  the  world 
by  trial  and  error  faces  the  reinforcement 
learning  problem,  which  is  quite  different 
from  standard  concept  learning.  Although 
good  algorithms  exist  for  this  problem  in 
the  general  case,  they  are  quite  inefficient. 

One  strategy  is  to  find  restricted  classes  of 
action  strategies  that  can  be  learned  more 
efficiently.  This  paper  pursues  that  strat¬ 
egy  by  developing  algorithms  that  can  effi¬ 
ciently  learn  action  maps  that  are  express¬ 
ible  in  Jfc-DNF.  Both  connectionist  and  classi¬ 
cal  statistics-based  algorithms  arc  presented, 
then  compared  empirically  on  three  test 
problems.  Modifications  and  extensions  that 
will  allow  the  algorithms  to  work  in  more 
complex  domains  are  also  discussed. 

1  Reinforcement  Learning 

Consider  an  agent  that  must  learn  10  act  ir.  the  world. 
At  each  moment  in  time,  it  gets  information  about 
the  world  from  its  sensors  and  must  choose  an  action 
to  take.  Having  executed  an  action,  the  agent  gets  a 
signal  from  the  world  that  indicates  how  well  the  agent 
is  performing;  we  shall  call  this  a  reinforcement  signal. 
The  reinforcement  signal  can  be  binary  or  real-valued 
and  it  will  typically  be  noisy. 

This  learning  scenario  is  quite  different  from  stan¬ 
dard  concept  learning,  in  which  a  teacher  presents 
the  learner  with  a  set  of  input/output  pairs.  In  the 
reinforcement-learning  scenario,  the  agent  must  choose 
an  output  to  generate  in  response  to  each  input.  The 
reinforcement  signal  it  receives  indicates  only  how  suc¬ 
cessful  that  output  was;  it  carries  no  information  about 
how  successful  other  outputs  might  have  been.  In  ad¬ 
dition,  the  fact  that  the  reinforcement  signal  is  noisy 
means  that  each  output  will  have  to  be  generated  a 
number  of  times  in  order  for  the  agent  to  acquire  an 
accurate  picture  of  which  is  better.  In  reinforcement- 
learning  situations,  an  agent  may  choose  an  action 
because  it  expects  it  to  have  good  results;  however, 
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it  may  also  choose  an  action  in  order  to  gain  in¬ 
formation  about  its  expected  results.  The  tradeoff 
between  acting  to  gain  reinforcement  and  acting  to 
gain  information  makes  this  problem  especially  in¬ 
teresting.  The  formal  foundations  of  reinforcement 
learning  have  been  widely  studied  [Kaelbling,  1989b, 
Kaelbling,  1989a,  Narendra  and  Thathachar,  1989, 
Berry  and  Ftistedt,  1985,  Williams,  1986]. 

This  paper  will  focus  on  a  simple  case  of  the  re¬ 
inforcement  learning  problem  in  which  the  following 
assumptions  hold: 

•  the  agent  has  only  two  possible  actions 

•  the  reinforcement  signal  at  time  t  + 1  reflects  only 
the  success  of  the  action  taken  at  time  t 

•  reinforcement  received  for  performing  a  particu¬ 
lar  action  in  a  particular  situation  is  1  with  some 
probability  p  and  0  with  probability  1  -p  and  each 
trial  is  independent 

•  the  expected  reinforcement  value  of  doing  a  par¬ 
ticular  action  in  a  particular  input  situation  stays 
constant  for  the  entire  run  of  the  learning  algo¬ 
rithm 

Section  6  discusses  the  extension  of  the  results  in  this 
paper  to  situations  in  which  each  of  the  above  assump¬ 
tions  is  relaxed. 

2  Complexity  Versus  Efficiency 

1'here  are  a  number  of  good  algorithms  for  the 
reinfc  :ement-learning  scenario  we  are  interested 
in,  including  learning-automata  algorithms  [Naren¬ 
dra  and  Thathachar,  1989],  Sutton’s  reinforcement- 
comparison  methods  [Sutton,  1984],  and  Kaelbling’s 
interval-estimation  methods  [Kaelbling,  forthcoming]. 
These  algorithms  were  originally  developed  for  the  case 
when  the  agent  has  no  inputs  other  than  reinforce¬ 
ment  and  merely  needs  to  decide  which  action  it  should 
take  all  the  time.  They  can  be  extended  to  the  case 
of  having  many  input  situations  simply  by  making  a 
copy  of  the  algorithm  for  each  possible  input  situation. 
This  method  works  well,  but  results  in  algorithms  with 
space  complexity  proportional,  at  least,  to  the  number 
of  possible  input  situations.  In  addition,  no  general¬ 
ization  is  exhibited;  that  is,  the  combined  algorithms 


do  not  take  advantage  of  the  common  intuition  that 
“similar”  input  situations  are  likely  to  require  “simi¬ 
lar”  actions. 

We  can  think  of  agents  as  learning  action  maps: 
mappings  from  input  situations  to  actions.  If  an  agent 
must  be  able  to  learn  action  maps  of  arbitrary  com¬ 
plexity,  then  the  methods  described  above  are  as  good 
as  any.  However,  if  we  restrict  the  class  of  action  maps 
that  we  expect  an  agent  to  learn,  we  can  invent  algo¬ 
rithms  for  learning  those  maps  that  are  much  more 
efficient  than  algorithms  for  the  general  case. 

A  restriction  that  has  proved  useful  to  the  concept¬ 
learning  community  is  to  the  class  of  functions  that 
can  be  expressed  as  propositional  formulae  in  Jfc-DNF. 
A  formula  is  said  to  be  in  disjunctive  normal  form 
(DNF)  if  it  is  syntactically  organized  into  a  disjunc¬ 
tion  of  purely  conjunctive  terms;  there  is  a  simple 
algorithmic  method  for  converting  any  formula  into 
DNF  [Enderton,  1972).  A  formula  is  in  the  class  Jfc- 
DNF  if  and  only  if  its  representation  in  DNF  contains 
only  conjunctive  terms  of  length  k  or  less.  There  is  no 
restriction  on  the  number  of  conjunctive  terms — just 
their  length.  Whenever  k  is  less  than  the  number  of 
atoms  in  the  domain,  the  class  fc-DNF  is  a  restriction 
on  the  class  of  functions. 

Valiant  was  one  of  the  first  to  consider  the  re¬ 
striction  to  learning  functions  expressible  in  Jfc-DNF 
[Valiant,  1984,  Valiani,  1985).  He  developed  the  fol¬ 
lowing  algorithm  for  learning  functions  in  i-DNF  from 
input-output  pairs,  which  actually  only  uses  the  input- 
output  pairs  with  output  0: 

Let  T  be  the  set  of  conjunctive  terms  of  length 
k  over  the  set  of  atoms  (corresponding  to  the 
input  bits)  and  their  negations  and  let  L  be 
the  number  of  learning  instances  required  to 
learn  the  concept  to  the  desired  accuracy .' 

for  i  :=  1  to  L  do  begin 

v  :=  randomly  drawn  negative  instance 
T  7’—  any  term  that  is  satisfied  by  v 
end 

return  T 

The  algorithm  returns  the  set  of  terms  remaining 
in  T ,  with  the  interpretation  that  their  disjunction  is 
the  concept  that  was  learned  by  the  algorithm.  This 
method  simply  examines  a  fixed  number  of  negative  in¬ 
stances  and  removes  any  term  from  T  that  would  have 
caused  one  of  the  negative  instances  to  be  satisfied.2 

The  following  sections  describe  algorithms  for  learn¬ 
ing  action  maps  in  Jfc-DNF  from  reinforcement  and 
present  the  results  of  an  empirical  comparison  of  their 


performance.  For  each  algorithm,  the  inputs  are  bit- 
vectors  of  length  A/,  plus  a  distinguished  reinforce¬ 
ment  bit;  the  outputs  are  single  bits. 


3  Connectionist  Methods  for 
Learning  fc-DNF 

There  has  been  interesting  work  in  the  connectionist 
community  on  learning  from  reinforcement,  which  is 
relevant  to  our  goals  because  it  focuses  on  using  more 
efficient  algorithms  to  learn  action  maps  in  a  restricted 
class  of  functions.  This  section  will  describe  three  con¬ 
nectionist  methods:  a  linear  reinforcement-comparison 
method,  a  multi-layer  backpropagation  method,  and 
a  hybrid  method  that  combines  Valiant’s  algorithm 
for  concept  learning  with  the  linear  reinforcement- 
comparison  method. 

These  and  other  algorithms  will  be  described  in  a 
standard  form  consisting  of  three  components:  s0  if' 
the  initial  internal  state  of  the  algorithm;  u(s,i,n,r) 
is  the  update  function,  which  takes  the  state  of  the 
algorithm  s,  the  last  input  i,  the  last  action  a,  and  the 
reinforcement  value  received  r,  and  generates  a  new 
algorithm  state;  and  e(s,i)  is  the  evaluation  function, 
which  takes  an  algorithm  state  s  and  an  input  i,  and 
generates  an  action. 


3.1  Linear  Reward- Comparison  Method 

Most  of  the  connectionist  methods  are  simple  single- 
layer  algorithms  that  can  learn  action  maps  in  the  class 
of  linearly  separable  functions  [Widrow  e.t  al.,  l!)7.'l. 
Sutton,  1984,  Barto  and  Anandan,  1985).  Sutton  [Sut¬ 
ton,  1984]  performed  extensive  experiments  on  such 
methods  and  found  that  reinforcement-comparison  al¬ 
gorithms  tend  to  have  the  best  performance  The 
equations  below  define  Algorithm  8  from  his  disser¬ 
tation  [Sutton,  1984],  which  uses  a  version  of  the 
Widrow-Hoff  or  Adaline  [Widrow  and  HofT,  I960] 
weight-update  algorithm. 


The  input  is  represented  as  an 
M -dimensional  vector  i.  The  internal  stale, 
so,  consists  of  two  A/ -dimensional  vectors,  v 
and  w. 


u{s,i,a,r)  = 


e(s,t)  = 


let  p:=  j  vjij 
for  j  =  1  to  M  do  begin 

wj  :=  wj  +  a(r  -  ;>)(a  -  l/2)?j 

vj  :=  Vj  +  fi(r  -  p)i, 

end 

/  1  «V;  +  "  >  0 

1  0  otherwise 


'This  choice  is  not  relevant  to  our  reinforcement- 
learning  scenario — the  details  arc  described  in  Valiant's 
papers  (Valiant,  1984,  Valiani,  1985]. 

2Valiant’s  presentation  of  the  algorithm  defines  T  to  be 
the  s,  of  conjunctive  terms  of  length  Jt  or  less  over  the  set 
of  atoms  and  their  negations;  however,  because  any  term 
of  length  less  than  k  can  be  represented  as  a  disjunction  of 
terms  of  length  k,  we  use  a  smaller  set  T  for  simplicity  in 
exposition  and  slightly  more  efficient  computation  time. 


where  ex  >  0,  0  <  ft  <  1,  and  u  is  a  normally 
distributed  random  variable  of  mean  0  and 
standard  deviation  6y. 

The  output,  e(s,t),  has  value  1  or  0  depending  on 
the  inner  product  of  w  and  i  and  the  value  of  the  ran¬ 
dom  variable  v.  The  addition  of  the  random  value 
causes  the  algorithm  to  “experiment”  by  occasionally 
performing  actions  that  it  would  not  otherwise  have 


taken.  The  updating  of  the  vector  w  is  somewhat  com¬ 
plicated:  each  component  is  incremented  by  a  value 
with  four  terms.  The  first  term,  a,  is  a  constant  that 
represents  the  learning  rate.  The  next  term,  r  -  p, 
represents  the  difference  between  the  actual  reinforce¬ 
ment  received  and  the  predicted  reinforcement,  p.  This 
serves  to  normalize  the  reinforcement  values:  the  abso¬ 
lute  value  of  the  reinforcement  signal  is  not  as  impor¬ 
tant  as  its  value  relative  to  the  average  reinforcement 
that  the  agent  has  been  receiving.  The  predicted  re¬ 
inforcement,  p,  is  generated  using  a  standard  linear 
associator  that  learns  to  associate  input  vectors  with 
reinforcement  values  by  setting  the  weights  in  vector  v. 
The  third  term  in  the  update  function  for  w  is  a  - 1/2: 
it  has  constant  absolute  value  and  the  sign  is  used  to 
encode  which  action  was  taken.  The  final  term  is  ij, 
which  causes  the  jth  component  of  the  weight  vector 
to  be  adjusted  in  proportion  to  the  j'th  value  of  the 
input. 

The  space  required  for  the  state,  as  well  as  time  for 
both  update  and  evaluation  operations  is  O(M),  where 
M  is  the  number  of  input  bits. 

3.2  Multi-lnycr  Back-propagation  Method 

Error  back-propagation  is  a  method  for  training  con- 
nectionist  networks  that  are  comprised  of  multiple  lay¬ 
ers.  Anderson  [Anderson,  1986]  has  designed  a  connec¬ 
tion^  system  with  multiple  layers  that  uses  backprop- 
agation  as  a  method  for  learning  from  reinforcement. 

Anderson’s  system  uses  two  networks:  one  for  learn¬ 
ing  to  predict  reinforcement  and  one  for  learning  which 
action  to  take.  Each  of  these  is  a  two-layer  network, 
with  all  of  the  hidden  units  connected  to  all  of  the 
inputs  and  all  of  the  inputs  and  hidden  units  con¬ 
nected  to  the  outputs.  The  system  was  designed  to 
work  in  worlds  with  delayed  reinforcement  (which  are 
discussed  here  at  greater  length  in  Section  6),  but  it  is 
easily  modified  to  work  in  our  simpler  domain.  This 
algorithm  is  rather  complex,  so  space  does  not  allow 
it  to  be  described  further.  A  clear  description  can  be 
found  in  Anderson’s  dissertation  [Anderson,  1986]. 

This  method  is  theoretically  able  to  'earn  very  com¬ 
plex  functions,  but  tends  to  require  many  training  in¬ 
stances  before  it  converges.  The  time  and  space  com¬ 
plexity  for  this  algorithm  is  O(MH),  where  M  is  the 
number  of  input  bits  and  H  is  the  number  of  hidden 
units. 

3.3  A  Hybrid  Algorithm 

Given  our  interest  in  restricted  classes  of  functions, 
we  can  construct  a  new  hybrid  algorithm  for  learning 
.  -tion  maps  in  fc-DNF,  It  hinges  on  the  simple  obser¬ 
vation  that  any  such  function  can  be  expressed  as  a 
linear  combination  of  terms  in  the  set  T.  where  T  is 
the  set  of  conjunctive  terms  of  length  k  over  the  set 
of  atoms  (corresponding  to  the  input  bits)  and  their 
negations.  It  is  possible  to  take  the  original  A/ -bit  in¬ 
put  signal  and  transduce  it  to  a  wider  signal  that  is  the 
result  of  evaluating  each  member  of  T  on  the  original 
inputs.  We  can  use  this  new  signal  as  input  to  a  rela¬ 
tively  simple  connectionist  learning  algorithm,  such  as 


the  one  described  in  Section  3.1  above. 

If  there  arc  M  input  bits,  the  set  T  has  size  C(2M,  k) 
because  we  arc  choosing  from  the  set  of  bits  and  their 
negations.  However,  we  can  eliminate  all  elements  that 
contain  both  an  atom  and  its  negation,  yielding  a  set  of 
size  2kC(M,k).  The  space  required  by  the  algorithm, 
as  well  as  the  time  to  update  the  internal  state  or  to 
evaluate  an  input  instance,  is  proportional  to  the  size 
of  7’,  and  thus,  0(Mk).  It  is  important  to  note  that 
this  algorithm  (as  well  as  the  other  three  discussed  in 
this  paper)  is  strictly  incremental:  its  time  and  space 
requirements  depend  only  on  the  size  of  the  input  and 
on  the  fixed  parameter  it  and  do  not  increase  over  the 
course  of  a  run. 

4  Interval-Estimation  Algorithm  for 

fc-DNF 

The  interval-estimation  algorithm  for  it-DNF  is,  like 
the  hybrid  algorithm  described  in  Section  3.3,  based 
on  Valiant’s  algorithm,  but  the  interval-estimation  al¬ 
gorithm  uses  standard  statistical  estimation  methods 
rather  than  connectionist  weight-adjustments.  The 
technique  of  interval-estimation  has  also  been  applied 
to  other  reinforcement-learning  problems  [Kaelbling, 
forthcoming], 

4.1  General  Description 

This  section  will  describe  the  algorithm  independent 
of  particular  statistical  tests,  which  will  be  introduced 
in  the  next  section.  We  shall  need  the  following  defi¬ 
nitions,  however.  An  input  bit-vector  satisfies  a  term 
whenever  all  the  bits  mentioned  positively  in  the  term 
have  value  1  in  the  input  and  all  the  bits  mentioned 
negatively  in  the  term  have  value  0  in  the  input.  The 
quantity  er(t,a)  is  the  expected  value  of  the  reinforce¬ 
ment  that  the  agent  will  gain,  per  trial,  if  it  generates 
action  a  whenever  term  t  is  satisfied  by  the  input  and 
action  -^a  otherwise.  The  quantity  ubra(t,a)  is  the  up¬ 
per  bound  of  a  100(1  -  o)%  confidence  interval  on  the 
expected  reinforcement  gained  from  performing  action 
a  whenever  term  t  is  satisfied  by  the  input.  We  can 
now  give  the  formal  definition  of  the  algorithm 

so  =  the  set  T,  with  a  collection  of  statistics 
associated  with  each  member  of  the  set 

e(s,i)  =  for  each  t  in  S 

if  i  satisfies  t  and 
ubrn(t ,  1)  >  ubra(t,  0)  and 
Pr(cr(t,  1)  =  er(<,0))  <  0 
then  return  1 
return  0 

u{sKi,ntr)  =  for  each  t  in  S 

update Jerm..staiis(ics(t ,  / . a.  r) 
return  s 

At  any  moment  in  '.he  operation  of  this  algorithm, 
we  can  extract  a  symbolic  description  of  its  current 
hypothesis.  It  is  the  disjunction  of  all  terms  t  such  that 
ubra(l,  1)  >  ubra(t,  0)  and  Pr(er(t,  1)  =  er(t, 0))  <  0. 
This  is  the  It-DNF  expression  according  to  which  the 
agent  is  choosing  its  actions. 


The  evaluation  criterion  is  chosen  in  such  a  way  as 
to  make  the  important  trade-off  between  acting  to  gain 
information  and  acting  to  gain  reinforcement.  A  naive 
method  would  be  for  each  term  to  generate  a  1  when¬ 
ever  action  1  has  had  a  higher  success  rate  than  action 
0.  This  would  be  a  very  bad  strategy,  however,  be¬ 
cause  if  the  first  trial  of  action  0  failed,  its  success  rate 
would  be  0,  causing  action  0  never  to  be  chosen  again. 
The  interval  estimation  method  works  because  of  the 
fact  that  the  value  of  ubr  can  be  high  for  two  rea¬ 
sons.  It  may  be  high  because  the  confidence  interval 
is  very  large  due  to  the  action  not  having  been  tried 
very  often — this  will  cause  the  action  to  be  chosen  in 
order  to  gain  information.  The  upper  bound  may  also 
be  high  because  the  confidence  interval  is  small  and 
the  action  has  a  genuinely  high  payoff— this  will  cause 
an  action  to  be  chosen  in  order  to  gain  reinforcement. 
At  the  beginning  of  a  course  of  execution  of  this  al¬ 
gorithm,  actions  are  chosen  almost  at  random,  until 
the  upper  bound  of  the  worse  action  is  driven  down 
by  sampling,  while  the  upper  bound  of  the  other  stays 
high.  The  value  of  a  determines  the  size  of  the  confi¬ 
dence  interval:  when  it  is  small  the  confidence  interval 
is  large  and  the  algorithm  is  very  conservative.  It  is 
not  likely  to  converge  to  the  wrong  action,  but  it  may 
take  a  long  time  to  converge.  As  a  is  increased,  the 
confidence  intervals  become  smaller,  the  learning  rate 
faster,  and  the  chance  of  gross  error  higher. 

Let  the  equivalence  probability  of  a  term  be  the  prob¬ 
ability  that  the  expected  reinforcement  is  the  same  no 
matter  what  choice  of  action  is  made  when  the  term  is 
satisfied.  The  second  requirement  for  a  term  to  cause 
a  1  to  be  emitted  is  that  the  equivalence  probability  be 
small.  Without  this  criterion,  terms  for  which  no  ac¬ 
tion  is  better  will,  roughly,  alternate  between  choosing 
action  1  and  action  0.  Because  the  output  of  the  entire 
algorithm  will  be  1  whenever  any  term  has  the  value 
1,  this  alternation  of  values  can  cause  a  large  number 
of  wrong  answers.  Thus,  if  we  can  convince  ourselves 
that  a  term  is  irrelevant  by  showing  that  its  choice  of 
action  makes  no  difference,  we  can  safely  ignore  it. 


where  zaf2  is  such  that  Pr(Z  >  za/ 2)  =  Pr (Z  < 
-za/i)  =  a/2  when  Z  is  a  standard  normal  ran¬ 
dom  variable  [Larsen  and  Marx,  1986).  This  allows 
us  to  define  u6ro(f,0)  as  h(so,no,a)  and  ubra(t,\)  as 
/»(«!, nj,o),  where  so.  no.  «i,  and  nj  are  the  statistics 
associated  with  term  t. 

To  test  for  equality  of  the  underlying  Bernoulli  pa¬ 
rameters,  we  use  a  two-sided  test  at  the  0  level  of 
significance  that  rejects  the  hypothesis  that  the  pa¬ 
rameters  are  equal  whenever 

la.  -  l i.  f  <  -zp/ 2 

— - - -  n‘  -  is  either  <  or  , 

./(*!?&•)(«-£&■)(" ■+*,)  [  >+*/»/» 

V  n0n, 

where  zp/ 2  is  a  standard  normal  deviate  [Larsen  and 
Marx,  1986].  Because  sample  size  is  important  for  this 
test,  the  algorithm  is  slightly  modified  to  ensure  that, 
at  the  beginning  of  a  run,  each  action  is  chosen  a  min¬ 
imum  number  of  times,  referred  to  by  the  parameter 

0min- 

The  complexity  of  this  algorithm  is  the  same  order 
as  that  of  the  hybrid  connectionist  algorithm  of  Section 
3.3,  namely  0(Mk). 

5  Empirical  Comparison 

This  section  reports  the  results  of  a  set  of  experiments 
designed  to  compare  the  performance  of  the  algorithms 
discussed  in  this  paper. 

5.1  Algorithms  and  Environments 

The  following  algorithms  were  tested  in  these  experi¬ 
ments: 

•  LINCONN  Linear  reinforcement-comparison  algo¬ 
rithm 

•  LINCONN+  Linear  reinforcement-comparison  with 
an  extra  input  wired  to  have  a  constant  value 

•  connkdnf  Hybrid  connectionist  algorithm  for  k- 
DNF 


4.2  Statistics 

In  the  simple  reinforcement-learning  scenario  we  are 
considering,  the  necessary  statistical  tests  are  also 
quite  simple.  For  each  term,  we  store  the  following 
statistics:  no,  the  number  of  trials  of  action  0;  so,  the 
number  of  successes  of  action  0;  ni,  the  number  of 
trials  of  action  1;  and  «j,  the  number  of  successes  of 
action  1.  These  statistics  are  incremented  only  when 
the  associated  term  is  satisfied  by  the  current  input 
instance. 

If  n  is  the  number  of  trials  and  s  the  number  of 
successes  arising  from  a  series  of  Bernoulli  trials  with 
success  probability  p,  the  upper  bound  of  a  100(1  -  a) 
percent  confidence  interval  for  p  can  be  approximated 
by 


•  [EKDNF  Interval-estimation  algorithm  for  fc-DNF 

•  BP  Anderson's  error  back-propagation  algorithm 

•  IE  Basic  interval-estimation  algorithm 

The  basic  interval-estimation  algorithm  IE  (Kaelbling, 
forthcoming]  is  included  as  a  yardstick;  it  is  computa¬ 
tionally  much  more  complex  than  the  other  algorithms 
and  will  very  likely  out-perform  them. 

Each  of  the  algorithms  was  tested  in  three  different 
environments.  The  environments  are  called  btnomtal 
Boolean  expression  worlds  and  can  be  characterized  by 
the  following  parameters:  M,  expr,  pitl  pt„,  po»,  and 
Pen  The  parameter  M  is  the  number  of  input  bits; 
expr  is  a  Boolean  expression  over  the  input  bits;  p  1,  is 
the  probability  of  receiving  reinforcement  value  1  given 
that  action  1  is  taken  when  the  input  instance  satisfies 
expr,  pi„  is  the  probability  of  receiving  reinforcement 
value  1  given  that  action  1  is  '  .ken  when  the  input 
instance  does  not  satisfy  expr,  p0,  is  the  probability 
of  receiving  reinforcement  value  1  given  that  action  0 


K£S3 
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K77MH 
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HD 

Table  1:  Parameters  of  test  environments  for  ic-DNF 
experiments. 

is  taken  when  the  input  instance  satisfies  ezpr,  and 
Pon  is  the  probability  of  receiving  reinforcement  value 
1  given  that  action  0  is  taken  when  the  input  instance 
does  not  satisfy  expr.  Input  vectors  are  chosen  by  the 
world  according  to  a  uniform  probability  distribution. 

Table  1  shows  the  values  of  these  parameters  for  each 
task.  The  first  task  has  the  simple,  linearly  separable 
expression  (i'o  A  t'i)  V  («i  A  *2);  what  makes  it  diffi¬ 
cult  is  the  small  separation  between  the  reinforcement 
probabilities.  Task  2  has  highly  differentiated  rein¬ 
forcement  probabilities,  but  the  function  to  be  learned, 
(»0A->ii)V(ii  A-U2)V(«2A-’«o),  is  a  complex  exclusive- 
or.  Finally,  Task  3  is  the  simple  conjunctive  function, 
i 2  A  -it's,  but  all  of  the  reinforcement  probabilities  are 
high  and  there  are  6  input  bits  rather  than  only  3. 

5.2  Parameter  Tuning 

Each  of  the  algorithms  has  a  set  of  parameters.  For 
both  IEKDNF  and  connkdnf,  k  =  2.  The  simple 
connectionist  algorithms  MNCONN  and  LINCONN+  as 
well  as  CONNKDNF  have  parameters  a,  0,  and  <r.  Fol¬ 
lowing  Sutton  [Sutton,  1984],  parameters  0  and  o  in 
CONNKDNF,  LINCONN,  and  LINCONN+  will  be  fixed  to 
have  values  .1  and  .3,  respectively.  The  IEKDNF  al¬ 
gorithm  has  t.  o  confidence-interval  parameters,  za/2 
and  zpii,  and  a  minimum  age  for  the  equality  test 
0mm,  while  the  IE  algorithm  has  only  za/2.  Finally, 
the  BP  algorithm  has  a  large  set  of  parameters:  0, 
'earning  'ate  of  the  evaluation  output  units;  0h,  learn¬ 
ing  rate  of  the  evaluation  hidden  units;  p,  learning  rate 
of  the  action  output  units;  and  ph,  learning  rate  of  the 
action  hidd  n  units.  In  each  of  the  tasks,  the  BP  algo¬ 
rithm  had  many  hidden  units  as  inputs. 

All  of  the  parameters  for  each  algorithm  were  be 
chosen  to  optimize  t.\e  behavior  of  that  algorithm  on 
the  chosen  task.  The  success  of  an  algorithm  was  mea¬ 
sured  by  the  average  reinforcement  received  per  tick, 
averaged  over  the  entire  run.  For  each  algorithm  and 
environmer,  l,  a  series  of  100  trials  of  length  3000  were 
run  with  different  parameter  values.  Table  2  shows  the 
best  set.  of  parameter  values  found  for  each  algorithm- 
environment  pair. 

5.3  Results 

Having  chosen  the  best  parameter  values  for  each  al¬ 
gorithm  and  environment,  the  performance  of  the  al¬ 
gorithms  was  compared  on  runs  of  length  3000  using 
the  parameter  settings  of  Table  2.  The  performance 
metric  was  average  reinforcement  per  tick,  averaged 
over  the  entire  run.  The  results  are  shown  in  Tab'* 
3,  together  with  the  expected  reinforcement  of  execu. 
ing  a  completely  random  behavior  (choosing  actions 
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.125 
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Table  2:  Best  parameter  values  for  each  ifc-DNF  algo¬ 
rithm  in  each  environment. 


ALG-TASK 

1 

2 

3 

LINCONN 

.5329 

.7418 

.7759 

LINCONN  + 

.5456 

.7459 

.7722 

CONNKDNF 

.5783 

.8903 

.7825 

IEKDNF 

.5789 

.8900 

.7993 

BP 

.5456 

.7406 

.7852 

IE 

5827 

.8966 

.7872 

random 

.5000 

.5000 

.6750 

optimal 

.6000 

.9000 

.8250 

Table  3:  Average  reinforcement  for  it-DNF  problems 
over  100  runs  of  length  3000 
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Figure  1:  Significant  dominance  partial  order  among 
(fc-DNF  algorithms  for  Task  1. 


Figure  2:  Significant  dominance  partial  order  among 
ifc-DNF  algorithms  for  Task  2. 


0  and  1  with  equal  probability)  and  of  executing  the 
optimal  behavior.  These  results  do  not  tell  the  entire 
story,  however.  It  is  important  to  test  for  statistical 
significance  to  be  relatively  sure  that  the  ordering  of 
one  algorithm  another  did  not  arise  by  chance. 
Figures  1,  2  and  3  show,  for  each  task,  a  pictorial  rep¬ 
resentation  of  the  results  of  a  1-sided  i-test  applied  to 
each  pair  of  experimental  results.  The  graphs  encode  a 
partial  order  of  significant  dominance,  with  solid  lines 
representing  significance  at  the  .95  level  and  dashed 
lines  representing  signihw  >nce  at  the  .85  level. 

With  the  best  parameter  values  for  each  algorithm, 
it  is  also  of  some  interest  to  compare  the  rate  at  which 
performance  improves  as  a  function  ot  the  number  of 
training  instances.  Figures  4,  5,  and  6  show  superim¬ 
posed  plots  of  the  learning  curves  for  each  of  the  al¬ 
gorithms.  Each  point  represents  the  average  reinforce¬ 
ment  received  over  a  sequence  of  100  steps,  averaged 
over  100  runs  of  length  3000. 

5.4  Discussion 

On  Tasks  1  and  2  the  basic  interval-estimation  algo¬ 
rithm,  IE,  performed  significantly  better  than  any  of 
the  other  algorithms.  The  magnitude  of  its  superior¬ 
ity,  however,  is  not  extremely  great— Figures  4  and 
5  reveal  that  the  IEKDNF  and  CONNKDNF  algorithms 
have  similar  performance  characteristics  both  to  each 
other  and  to  IE.  On  these  two  tasks,  the  overall  per¬ 
formance  of  IEKDNF  and  CONNKDNF  were  not  found  to 
be  significantly  different. 

The  backpropagation  algorithm,  BP,  performed  con¬ 
siderably  worse  than  expected  on  Tasks  1  and  2.  It 
is  very  difficult  to  tune  the  parameters  for  this  algo¬ 
rithm,  so  its  bad  performance  may  be  explained  by 
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Figure  3:  Significant  dominance  partial  order  among 
ifc-DNF  algorithms  for  Task  3. 


Figure  4:  Learning  curves  for  Task  1. 
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Figure  5:  Learning  curves  for  Task  2. 
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Figure  6:  Learning  curves  for  Task  3. 


a  sub-optimai  setting  of  parameters.3  However,  it  is 
possible  to  see  in  the  learning  curves  of  Figures  4  and 
5  that  the  performance  of  BP  was  still  increasing  at 
the  ends  of  the  runs.  This  may  indicate  that  with 
more  training  instances  it  would  eventually  converge 
to  optimal  performance. 

The  simple  linear  connectionist  algorithms  per¬ 
formed  poorly  on  both  Tasks  1  and  2.  This  poor  per¬ 
formance  was  expected  on  Task  2,  because  such  algo¬ 
rithms  are  known  to  lie  unable  to  learn  non-linearly- 
separable  functions.  Task  1  is  difficult  for  these  al¬ 
gorithms  because,  during  the  execution  of  t*'c  algo¬ 
rithm,  the  evaluation  function  is  often  too  complex  to 
be  learned  by  the  simple  linear  associator.  Adding  a 
constant  inp  <t  value  to  the  simple  linear  connectionist 
algorithm  made  a  significant  improvement  in  peifor- 
mance;  this  is  not  surprising,  because  it  allows  dis¬ 
crimination  hyperplanes  that  do  not  pass  through  the 
origin  of  the  space  to  be  found. 

Task  3  reveals  many  interesting  strengths  and  weak¬ 
nesses  of  the  algorithms.  One  of  the  most  interesting 
is  that  IE  is  no  longer  the  best  performer.  Because  the 
target  function  is  simple  and  there  is  a  larger  num¬ 
ber  of  input  bits,  the  ability  to  generalize  across  input 
instances  becomes  important.  The  IEKDNF  algorithm 
is  able  to  find  the  correct  hypothesis  early  during  the 
run  (this  is  apparent  in  the  learning  curve  of  Figure 
6).  However,  because  the  reinforcement  values  are  not 
highly  differentiated  and  because  the  size  of  the  set  T 
is  quite  large,  it  begins  to  include  extraneous  terms  due 
to  statistical  fluctuations  in  the  environment,  causing 
slightly  degraded  performance. 

The  IE,  BP,  and  CONNKDNF  algorithms  all  have  very 
similar  performance  on  Task  3,  with  the  linear  connec¬ 
tionist  algorithms  performing  lightly  worse,  but  still 
reasonably  well. 

6  Relaxing  the  Assumptions 

This  section  will  discuss  the  consequences  of  relaxing 
the  assumptions  made  at  the  beginning  of  this  paper, 
especially  in  the  context  of  the  two  better-performing 
algorithms,  IEKDNF  and  CONNKDNF.  In  some  cases, 
simple  changes  can  be  made  to  the  algorithms  that  will 
allow  them  to  work  in  the  more  general  situations  In 
others,  there  are  theoretical  problems  that  make  ex¬ 
tensions  difficult.  Each  of  the  concrete  extensions  pro¬ 
posed  to  the  IEKDNF  algorithm  has  been  implemented 
and  tested. 

Thus  far  we  have  assumed  that  the  agent  has  only 
two  possible  actions.  Many  of  the  early  learning- 
automata  algoiithms  are  directly  applicable  to  prob¬ 
lems  with  more  than  two  actions.  It  has  also  been 
shown  [Kaelbling,  forthcoming]  that  the  problem  of 
generating  actions  specified  by  Ar  output  bits  car.  be 

'  In  the  parameter  tuning  phase,  the  parameters  were 
varied  independently — it  may  well  be  necessary  to  perform 
gradient-ascent  search  in  the  parameter  space,  but  that  is 
a  computationally  difficult  task,  especially  when  the  eval¬ 
uation  of  any  point  in  parameter  space  may  have  a  high 
degree  of  noise. 


solved  by  N  interconnected  modules  that  learn  to  gen¬ 
erate  one  output  bit  from  reinforcement.  Thus,  the 
algorithms  presented  here  could  be  applied,  using  this 
method,  to  problems  with  many  possible  outputs. 

The  problem  of  delayed  reinforcement  has  been 
addressed  by  Sutton  [Sutton,  1988]  and  Watkins 
[Watkins,  1989],  among  others.  Sutton’s  solution, 
called  the  temporal  difference  method  (TD)  can  be 
abstracted  away  from  the  particular  reinforcement- 
learning  mechanism  being  used.  It  provides  a  mod¬ 
ule  that  learns  to  transduce  the  delayed  reinforcement 
signal  that  is  coming  from  the  world  into  an  immedi¬ 
ate  reinforcement  signal  that  evaluates  each  state  of 
the  world  to  be  the  expected  future  reward  based  on 
the  agent’s  current  strategy.  Because  this  local  rein¬ 
forcement  signal  must  be  learned,  using  a  TD  module 
violates  a  different  one  of  our  assumptions:  that  the 
expected  reinforcement  of  performing  an  action  in  a 
situation  be  fixed  over  the  course  of  a  run.  This  will 
be  addressed  below. 

If  the  reinforcement  provided  by  the  world  cannot 
be  modeled  as  independent  trials  of  some  sort,  then 
it  is  very  difficult  to  use  explicit  statistical  methods. 
The  connectionist  algorithms  are  implicitly  statistical 
and  would  also  have  trouble  in  such  worlds.  How¬ 
ever,  if  the  trials  are  independent,  we  have  a  variety  of 
different  statistical  models  available.  The  CONNKDNF 
algorithm,  as  presented,  can  be  used  when  the  rein¬ 
forcement  is  real-valued.  The  IEKDNF  algorithm  can 
be  implemented  with  different  statistical  tests.  For 
instance,  if  we  know  that  the  reinforcement  values  for 
each  input-action  pair  are  normally  distributed,  we  can 
use  standard  statistical  methods  to  construct  confi¬ 
dence  intervals  and  to  test  for  equality  of  means.  If  we 
have  no  model,  we  can  use  non-parametric  methods 

Finally,  we  consider  the  case  of  having  the  expected 
reinforcement  of  performing  an  action  in  a  situation 
change  during  the  course  of  a  run.  The  CONNKDNF 
algorithm  will  work  in  such  cases,  although  it  might  be 
necessary  to  adjust  its  parameters.  The  statistically- 
based  IEKDNF  algorithm  can  be  modified  to  work,  by 
causing  its  statistics  to  decay  over  time.  If  an  action 
has  not  been  tried  for  a  long  time,  its  n  value  will 
slowly  decay,  which  will  cause  its  confidence  interval 
t.o  grow  larger.  Eventually  it  will  grow  large  enough 
for  that  action  t  y  be  chosen  again.  If  the  action  has 
good  results,  the  policy  will  be  changed  to  favor  this 
action. 

7  Conclusion 

from  this  study,  we  can  see  that  it  is  useful  to  de¬ 
sign  algorithms  that  are  tailored  to  learning  certain 
restricted  das  <•:  of  functions.  The  two  specially- 
designed  algorithms  far  out.-performed  standard  meth¬ 
ods  of  comparable  complexity.  The  CONNKDNF  and 
IEKDNF  algorithms  each  have  their  strengths  and 
weaknesses.  It  is  possible  that  CONNKDNT  may  out¬ 
perform  iekdnf  to  some  extent  because  in  CONNKDNF 
each  term  gets  to  contribute  to  the  answer  with  differ¬ 
ent  degrees.  This  avoids  errors  that  occur  in  IEKDNF 
when  a  single  term  is  barely  over  the  threshold  for  gen- 


erating  a  1.  On  the  other  hand,  the  state  of  iekdnf 
has  internal  semantics  that  arc  clear  and  directly  in¬ 
terpretable  in  the  language  of  classical  statistics.  This 
simplifies  the  process  of  extending  the  algorithm  to 
apply  to  other  types  of  worlds  in  a  principled  manner. 

Important  future  work  will  be  to  identify  other  re¬ 
stricted  classes  of  functions  that  can  be  learned  .fi- 
ciently  and  effectively  from  reinforcement  and  demon¬ 
strate  that  these  classes  contain  functions  that  solve  in¬ 
teresting  and  important  problems  from  the  real  world. 
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Abstract 

This  paper  describes  an  implemented  architecture  for  intermediate  vision.  By  in¬ 
tegrating  a  variety  of  intermediate  visual  mechanisms  and  putting  them  to  use  in 
support  of  concrete  activity,  the  implementation  demonstrates  their  utility.  The 
system,  SIVS,  models  psychophysical  discoveries  about  visual  attention  and  search. 
It  is  designed  to  be  efficiently  implementable  in  biologically  realistic  hardware. 

SIVS  addresses  five  fundamental  problems.  Visual  attention  is  required  to  restrict 
processing  to  task-relevant  locations  in  the  inrage.  Visual  search  finds  such  loca¬ 
tions.  Visual  routines  are  a  means  for  nonuniform  processing  based  on  task  de¬ 
mands.  Intermediate  objects  keep  track  of  intermediate  results  of  this  processing. 
Visual  operators  are  a  set  of  relatively  abstract,  general-purpose  primitives  for  spa¬ 
tial  analysis,  out  of  which  visual  routines  are  assembled. 
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Introduction 


This  paper  presents  an  implemented  architecture  for  intermediate  vision :  the  mechanisms  that 
connect  bottom-up  early  vision  with  later,  task-specific  processing.  The  system,  SFVS, 

•  models  relevant  psychophysical  results, 

•  obeys  the  constraints  imposed  by  biologically  plausible  hardware, 

•  addresses  key  computational  problems  in  vision  that  are  often  passed  over,  and 

•  integrates  a  variety  of  mechanisms  to  support  complex  activity  in  a  realistic  task  domain. 

Background  and  summary 

Unlike  some  machine  vision  systems  which  seek  engineering  solutions  by  whatever  means,  SIVS 
is  intended  to  model  specifically  biological  vision.  The  inputs  and  the  first  few  early  stages  of 
mammalian  visual  processing  are  relatively  well  understood  as  a  result  of  neurophysiological 
studies  and  computational  modeling  [20,  30,  31].  We  know  less  about  the  nature  of  processing 
after  early  vision  and  before  the  outputs.  Computational  studies  have  mainly  addressed  the 
problem  of  object  recognition  by  shape  matching.  Object  recognition,  often  referred  to  as  late 
vision,  is  an  important  part  of  visual  processing,  but  there  is  much  evidence  (reviewed  later 
in  the  paper)  for  intermediate  visual  processes  in  addition.  Vision  does  not  leap  from  early 
representations  to  final  outputs  in  a  single  step.  Unfortunately,  relatively  little  is  known  about 
intermediate  visual  processing.  There  is  little  relevant  neurophysiological  evidence,  for  example. 

Progress  at  this  point  seems  to  require  the  construction  of  plausible  models  which  can  suggest 
questions  for  neuroscientific,  psychological,  and  computational  experiments.  Such  a  model  must 
respect  the  evidence  that  is  available,  even  if  it  is  scanty;  the  model  this  paper  proposes  is 
informed  by  psychophysical,  neurocomputational,  and  engineering  evidence.  SIVS  models  the 
psychophysics  of  visual  attention  and  search  in  detail.  It  is  designed  to  be  implementable  in 
slow,  massively  parallel,  locally  connected  hardware,  such  as  that  found  in  the  brain.  It  is  based 
on  an  engineering  analysis  of  the  intermediate  vision  task,  it  has  proven  adequate  to  support 
visually-guided  activity  in  a  complex  domain. 

The  intermediate  visual  processes  I  posit  pe.f  ,rm  non-local  computations  with  representa¬ 
tions  of  portions  of  the  image.  Thus  they  contrast  with  early  vision,  which  is  concerned  with 
local  computations,  and  with  late  vision,  v  n  ;  oduces  representations  of  external  objects. 
They  also  span  the  gap  between  early  and  late  p  xessing  in  terms  of  the  sorts  of  encodings  of 
information  used.  Early  vision  maps  the  retinal  inputs  point-by-point  into  retinotopic  represen¬ 
tations.  Late  vision  probably  encodes  its  outputs  with  what  I  will  call  compact  encodings ,  small 
groups  of  neurons  which  together  represent  a  particular  property  of  a  scene.  These  properties 
might  be  coded  as  boolean  values  or  continuous  scalars.  For  example  Perrett  el  al.  describe 
experiments  that  suggest  that  individual  monkey  neurons  respond  selectively  to  faces  [39].  (The 
interpretation  of  these  and  related  experiments  is  still  controversial;  see  [31]  for  a  review.)  The 
input  encodings  for  late  vision  are  unknown,  and  in  any  case  not  well  defined  since  the  scope  of 
“late”  vision  is  itself  not  well  defined.  However,  it  seems  likely  that  the  inputs  are  also  in  the 
form  of  compact  encodings  of  properties  of  regions  of  the  image.  Thus,  the  intermediate  visual 
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Figure  1:  Early,  intermediate,  and  late  vision.  Early  processing  computes,  point  by  point, 
retinotopic  maps  from  the  retinal  image;  intermediate  vision  reduces  these  maps  to  compact 
encodings;  late  vision  computes  exclusively  with  compact  encodings. 

processes  I  propose  fill  the  gap  by  reducing  retinotopic  representations  to  compact  ones  (figure 
!)• 

These  intermediate  processes  are  intended  to  be  roughly  equidistant  from  early  vision  and 
final  outputs.  Thus,  should  the  proposed  mechanisms  be  found  to  model  human  performance, 
they  will  place  considerable  constraint  on  the  remaining  parts  of  the  puzzle.  In  order  to  exploit 
this  constraint,  it  would  be  necessary  to  interface  SIVS  with  realistic  models  of  early  and  late 
processing.  I  haven’t  done  this;  I  chose  SIVS’s  domain  so  that,  although  it  was  of  practical 
use  in  a  broader  research  program,  I  did  not  have  to  implement  early  vision  or  general  object 
recognition.  However,  this  paper  specifies  the  interfaces  between  the  intermediate  processes  I 
implemented  and  early  and  late  vision;  section  8  discusses  some  remaining  difficulties. 

This  paper  addresses  five  issues  that  arise  in  the  computation  of  nonlocal  properties  of 
images.  These  issues  are  sufficiently  fundamental  that  it  seems  that  any  intermediate  vision 
system  will  have  to  address  them;  they  are  relatively  unstudied  in  the  computational  vision 
literature,  however. 

•  Visual  processing  must  be  applied  selectively  to  task-relevant  regions  of  the  image. 

•  The  visual  system  must  therefore  be  able  to  find  regions  of  the  image  with  task-relevant 
properties. 

•  Visual  processing  must  be  serial  in  part,  with  various  operations  performed  in  sequence 
and  according  to  environmental  conditions. 

•  Thus,  the  system  must  be  able  to  keep  track  of  intermediate  results  of  visual  computations. 

•  The  enormous  variety  of  visual  tasks  suggests  that  visual  processing  must  allow  the  de¬ 
velopment  of  new  patterns  of  processing  in  response  to  new  task  requirements. 
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SIVS  addresses  these  issues  with 


•  visual  attention ,  which  restricts  access  to  image  properties  to  a  “spotlight”  of  attention; 

•  visual  search,  which  can  direct  the  attentiona!  spotlight  to  regions  of  the  image  satisfying 
particular  critiera; 

•  visual  routines,  task-  and  situation-specific  sequential  patterns  of  visual  processing; 

•  intermediate  objects,  image-centered  representations  for  intermediate  results  of  visual  rou¬ 
tines;  and 

•  visual  operators,  a  set  of  general-purpose,  relatively  abstract  primitives,  which  are  com¬ 
bined  to  form  visual  routines. 

These  mechanisms  have  been  proposed  by  others  on  psychophysical  and  speculative  com¬ 
putational  grounds.  However,  many  aspects  of  these  proposed  mechanisms  have  previously 
been  left  vague.  As  has  often  been  the  case  in  cognitive  science,  a  computer  implementation 
forced  complete  specification,  thereby  uncovering  a  variety  of  new  issues.  Engineering  con¬ 
siderations  led  to  the  development  of  new  mechanisms  which  may  or  may  not  be  found  in 
human  vision.  These  issues  and  proposed  mechanisms  may  now  be  subjected  to  psychophysical 
scrutiny.  Further,  mechanisms  such  as  visual  search  have  typically  been  studied  in  isolation. 
SIVS  demonstrates  that  it  is  possible  to  integrate  several  such  mechanisms  to  achieve  synergistic 
power. 

Studies  linking  perception  and  action  have  been  rare  in  artificial  intelligence.  An  exception 
has  been  work  in  robotics,  but  the  vision  systems  used  there  have  tended  to  be  ad-hoc  and 
not  psychologically  motivated.  SIVS  is  designed  to  support  visually-guided  activity  in  a  psy¬ 
chologically  realistic  way.  This  is  important  because  the  psychophysical  studies  on  which  the 
proposed  mechanisms  are  based  were  conducted  on  isolated,  highly  specific,  artificial  tasks.  It 
was,  therefore,  possible  that  these  mechanisms  would  have  no  actual  use  in  a  broader  context, 
or  that  they  would  have  to  be  used  very  differently  under  ecologically  valid  circumstances.  That 
SIVS  is  able  to  support  complex  activity  in  a  realistic  domain  demonstrates  for  the  first  time 
that  these  mechanisms  are  of  practical  value. 

SIVS  is  part  of  a  larger  system  called  Sonja  [7].1  Sonja  integrates  advances  in  vision, 
natural  language  pragmatics,  and  action.  Sonja  plays  a  video  game  called  Amazon  modeled 
after  a  commerical  arcade  game.  Its  access  to  the  game  world  is  only  via  SIVS  and  via  the 
game’s  primitive  actions. 

Outline 

Section  2  describes  visual  attention,  the  ability  to  access  subsets  of  the  early  retinotopic  rep¬ 
resentations.  Neurophysiological  and  psychophysical  evidence  suggest  that  visual  attention  is 
implemented  with  a  mechanism  that  routes  information  from  dynamically  selected  locations 

’The  version  of  SIVS  reported  on  here  is  improved  in  several  respects  over  that  of  [7]. 
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to  a  central  node;  this  device  is  a  key  locus  of  the  reduction  of  retinotopic  representations  to 
compact  encodings. 

Section  3  describes  visual  search,  the  ability  to  find  locations  in  an  image  that  have  specified 
properties.  Visual  search  has  been  shown  psychophysically  to  depend  on  visual  attention; 
cu  many  cases  it  proceeds  by  serially  enumerating  and  testing  locations.  SIVS  is  the  first 
implemented  system  that  models  the  psychophysically  demonstrated  properties  of  human  visual 
search. 

Section  4  describes  visual  routines,  patterns  of  applications  of  particular  visual  operations 
over  time.  Visual  routines  can  do  geometrical  work,  6uch  as  finding  the  smallest  or  leftmost 
item  in  a  collection,  and  topological  work,  such  as  determining  connectedness  or  containment. 
The  notion  of  visual  routines  was  proposed  by  Ullman  [60],  and  SIVS  draws  heavily  on  his 
work.  His  proposal  is  sketchy  in  many  respects,  however;  this  paper  extends  it  and  renders  it 
specific.  Although  other  researchers  have  worked  in  the  visual  routines  framework  [28,  45],  no 
one  has  previously  produced  an  implementation  complete  enough  for  application. 

Section  5  describes  intermediate  objects,  which  are  used  to  keep  track  of  the  intermediate 
results  of  visual  routines  as  they  proceed.  There  are  four  sorts  of  intermediate  objects,  called 
markers,  lines,  'ays,  and  activation  planes. 

Section  6  describes  visual  operators,  hypothesized  bits  of  brain  hardware  which  perform 
particular  sorts  of  visual  work.  Visual  routines  are  sequences  of  activations  of  visual  operators; 
the  theory  hypothesizes  that  there  is  a  small,  innate  set  of  operators.  Three  typical  visual 
operators  find  the  distance  between  two  points,  track  a  moving  object,  and  find  the  extent  of  a 
homogeneous  image  region.  Section  6  describes  the  specific  set  of  visual  operators  used  in  SIVS 
and  the  criteria  for  choosing  them. 

Section  7  aescribes  the  use  of  SIVS  in  guiding  activity.  Practical  use  demonstrates  that 
SIVS  is  adequate  to  support  complex  activity  in  a  realistic  domain.  Visually  guided  activity  is, 
further,  an  interesting  problem  in  its  own  right;  and  its  requirements  differ  from  those  of  object 
recognition  and  other  well-studied  late  visual  tasks. 

Section  8  presents  conclusions,  evaluating  SIVS  and  describing  successes  and  outstanding 
problems. 

2  Visual  attention 

Visual  attention  is  the  ability  to  differentially  apply  visual  processing  to  a  subset  of  a  scene. 
It  is  taken  as  consisting  of  two  components:  overt  visual  attention,  or  gaze  direction,  which 
can  be  observed  with  an  eye  tracker;  and  covert  visual  attention,  which  is  neurally  mediated 
and  so  can  only  be  observed  indirectly.  Thi6  paper  U  concerned  only  with  covert  attention;  [7] 
briefly  describes  how  overt  attention  might  be  incorporated  into  the  model.  Following  standard 
usage,  I  will  use  “visual  attention”  to  mean  covert  visual  attention  when  this  will  not  result  in 
confusion. 

There  are  large  psychophysical  and  neuroscientific  literatures  on  visual  attention;  I  will 
review  some  of  this  literature  in  this  section.  While  the  data  are  uncertain  and  sometimes 
contradictory,  there  is  broad  agreement  about  some  g'  neral  facts. 
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The  primary  evidence  for  covert  visual  attention  comes  from  psychophysical  studies,  for 
instance  those  of  Posner  et  al.  [40].  In  a  typical  experiment,  subjects  are  required  to  react  to 
an  event  such  as  alight  coming  on  somewhere  in  the  visual  field.  The  results  of  6uch  experiments 
are  that 

•  Reaction  times  are  lower  when  the  subjects  are  told  where  in  the  field  the  event  will 
occur,  suggesting  that  visual  resources  can  more  effectively  be  brought  to  the  detection 
task  when  the  location  of  the  event  is  known. 

•  Covert  visual  attention  is  independent  of  (overt)  gaze  direction:  it  operates  even  when  the 
subjects  do  not  foveate  the  indicated  location  [36,  40],  and  brain  lesions  that  eliminate 
voluntary  eye  movements  do  not  affect  covert  attention  [41].2 

•  Visual  attention  is  at  least  partly  cognitively  penetrable  ?md  under  voluntary  control;  the 
event  location  can  be  indicated  by  non-natural  cues  [40]. 

•  The  bulk  of  ine  evidence  suggests  that  attention  can  be  directed  only  to  a  single  contiguous 
subset  of  the  image  [13,  40].3 

•  The  diameter  of  the  attended  subset  can  be  varied  voluntarily  [13, 22, 53, 56].  The  possible 
shapes  it  can  assume  and  the  distinctness  of  its  margin  are  uncertain  [53]/ 

These  observations  are  summarized  as  the  spotlight  model  of  attention,  in  which  attention 
“illuminates”  a  chosen  subset  of  the  image.  The  precise  nature  of  these  subsets  is  unclear,  so  I 
will  refer  to  them  neutrally  as  “locations”;  I  will  discuss  them  further  in  section  3.5. 

Covert  attention  interacts  closely  with  early  vision.  Therefore  I  will  summarize  necessary 
background  concerning  early  vision  before  proceeding.  The  retina  is  a  two-dimensional  array  of 
light  sensors.  Neuroscientific  study  shows  that  the  parts  of  the  brain  to  which  it  is  immediately 
connected  preserve  this  retinal  topology  [31].  This  retinotopic  neural  processing  comprises  early 
vision.  Early  vision  is  coming  to  be  quite  well  understood  [20,  30,  31].  In  addition  to  being 
retinotopic,  early  vision  is  bottom-up  and  applies  uniformly  and  in  parallel  over  the  image. 
Bottom-up  visual  processing  is  that  which  depends  only  on  the  retinal  image.  A  process  is 
bottom-up  if  and  only  if  the  same  computation  occurs  whenever  the  same  image  is  presented. 
Bottom-up  processing,  thus,  cannot  depend  on  any  non-visual  contextual  factors,  on  memories 
or  other  state,  or  on  the  agent’s  intentions.  Early  vision  produces  “unarticulated”  output 
representations,  typically  a  single  continuously  variable  or  boolean  value  at  each  point  in  the 
image.  Early  visual  processing  is  performed  by  a  set  of  fixed,  innate,  retinotopically  organized 
machines  called  early  maps.  The  identity  and  function  of  some  of  these  maps  are  known;  new 
maps  are  still  being  discovered  and  the  functional  properties  of  some  remain  to  be  determined. 

’There  may  be  weak  interactions  between  covert  and  overt  attention.  Krose,  for  instance,  presents  evidence 
that  the  detectability  of  a  *T”  in  a  background  of  *L*s  is  a  function  of  retinal  eccentricity  [25],  Other  researchers 
(such  as  Nakayama  and  Mackeben  (36,  p.  1639])  have  failed  to  find  snch  effects. 

’Earlier  studies  by  Shaw  and  Shaw  [16,  19]  suggesting  that  attention  can  be  split  over  arbitrary  subsets  of 
the  image  have  not  been  confirmed  by  more  recent  work;  bnt  see  Driver  and  Baylis  [10]. 

4Erik*en  and  St.  James  [13]  present  evidence  for  an  indistinct  margin,  with  processing  efficiency  decreasing 
gradually  from  the  center.  Farah’s  results  [14]  suggest  that  attention  can  be  directed  to  oddly-shaped  regions, 
but  this  may  instead  be  the  result  of  an  unrelated  activation  operation  (see  section  4). 
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There  appear  to  be  roughly  fifteen  maps;  among  them  are  ones  that  compute  color,  edge 
orientation,  stereoscopic  depth,  and  various  properties  of  motion  6ucb  as  speed,  direction,  and 
size  change.5 

Koch  and  Ullman  [24]  have  proposed  an  addressing  pyramid  as  the  hardware  supporting  the 
attentional  spotlight.  (This  proposal  was  inspired  by  neuroanatomical  speculations  of  Crick 
[8];  similar  proposals  have  been  made  by  Anderson  and  Van  Essen  [2],  Treisraan  and  Gormican 
[55],  Tsotsos  [58],  and  others.)  The  addressing  pyramid  is  similar  in  function  to  the  addressing 
hardware  of  a  conventional  serial  computer:  it  routes  inform  . tion  from  a  selected  part  of  a 
peripheral  array  to  a  central  location.  In  the  case  of  a  conventional  computer,  this  information 
is  the  contents  of  a  memory  location;  in  the  case  of  the  attentional  hardware,  it  is  the  contents 
of  the  early  representations  in  the  attended  location  in  the  retinotopic  array.  The  pyramid 
gets  its  name  from  its  two-dimensional  hierarchical  tree  organization.  It  consists  of  a  series  of 
exponentially  smaaer  stacked  layers  that  route  information  upwards  to  a  central  node  (figure  2). 
Each  level  is  composed  of  an  array  of  nodes,  each  of  which  selects  one  of  the  nodes  beneath  it 
to  route  to  its  superior.  Thus  the  system  as  a  whole  acts  as  a  recursive  t oinner-take-all  network 
[17],  eventually  routing  the  contents  of  just  one  leaf  node  up  to  the  root.  These  leaf  nodes 
actually  each  contain  the  values  of  the  early  represent  •  at  one  retinotopic  location.  I  will 
describe  how  pyramid  nodes  choose  among  their  subnoato  in  section  3. 

A  spotlight  of  variable  diameter  can  be  implemented  by  having  some  of  the  nodes  send  up 
a  combination  of  the  values  of  their  inferiors,  rather  than  choosing  a  single  one.  This  Las  been 
suggested  by  Treisman  and  Gormican  [55],  who  propose  that  interior  nodes  in  the  pyramid  can, 
selectively,  pass  up  the  average  of  the  early  values  of  their  inferiors,  rather  than  passing  up  the 
exact  value  of  a  single  chosen  inferior. 

This  addressing  pyramid,  then,  is  a  key  locus  of  the  “collapse”  of  retinotopic  representations 
into  compact  encodings  that  is  criterial  of  intermediate  vision.  Koch  and  Ullman  propose 
implementing  the  pyramid  in  terms  of  a  circuit  of  neuron-like  elements;  SIVS  follows  this 
proposal  closely.  I  will  '  scribe  the  implementation  further  in  section  3. 


‘Early  work  by  Zelri  [63,  64]  suggested  a  one-to-one  correspondence  between  retinotopic  maps  and  types  of 
visual  information.  It  is  now  known  that  the  correspondence  is  many-to-many  [31].  I  will  ignore  this  observation 
for  simplicity. 
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Figure  2:  The  addressing  pyramid.  Leaf  nodes  (solid  circles)  contain  buses  compactly  encoding 
early  properties.  Interior  nodes  (open  circles)  pass  information  from  their  inferiors  up  to  their 
superior.  Here  the  encircled  region  (containing  four  leaf  nodes)  is  addressed.  The  interior  node 
immediately  above  this  region  passes  up  the  average  of  the  four  leaf  values,  rather  than  selecting 
one.  The  other  interior  nodes  select  just  one  of  their  inferiors  to  pass  up  early  properties  from. 
Selected  communication  paths  are  drawn  as  solid  lines,  deselected  paths  as  dashed  ones. 
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3  Visual  search 


Visual  search  is  the  process  of  finding  locations  in  the  image  which  have  specified  properties. 
Visual  search  has  been  extensively  studied  psychophysicaily;  for  surveys,  see  Julesz  [22]  and 
Treisman  and  Gelade  [54].  In  most  psychophysical  experiments,  the  sorts  of  properties  searched 
for  are  very  simple:  “is  red,”  for  example,  rather  than  “is  a  chair  of  some  sort.”  Restricting 
attention  to  such  simple  properties  has  made  it  possible  to  isolate  the  mechanisms  that  probably 
underlie  more  complicated  sorts  of  search.  Fortunately,  in  Amazon  (the  videogame  domain 
SIVS  has  been  applied  to),  these  simple  properties  are  sufficient  to  locate  the  objects  that 
are  relevant  to  any  task.  Thus  it  was  possible  in  SIVS  to  implement  the  psychophysicaily 
demonstrated  mechanisms  without  much  speculative  extention. 

In  Sonja  visual  search  is  a  means  to  an  end,  as  well  as  an  object  of  study  in  itself.  Psy¬ 
chophysicists  have  principally  studied  visual  search  in  isolation  and  under  artificial  conditions. 
This  has  begged  questions  about  the  interface  between  these  mechanisms  and  other  visual  and 
non-visual  processes.  For  example,  questions  about  the  interaction  between  visual  search  and 
segmentation  that  must  be  answered  to  fully  specify  an  implementation  have  gone  unasked.  (I’ll 
take  this  point  up  in  section  3.5.)  More  seriously,  the  role  of  visual  search  in  broader  activity 
has  not  been  addressed.  SIVS  integrates  the  visual  search  mechanisms  discovered  psychophys¬ 
icaily  with  other  visual  processes,  and  (as  we’ll  see  in  section  7)  Sonja  further  integrates  all 
these  visual  processes  with  action  to  achieve  concrete  ends. 

This  section  first  explains  the  psychophysical  properties  of  visual  search  and  the  brain  archi¬ 
tecture  they  suggest,  then  explains  SIVS’s  implementation  of  that  architecture  and  compares 
it  with  related  computational  implementations. 

3.1  Psychophysics  of  visual  search 

The  central  result  of  the  visual  search  literature,  due  to  lYeisman  and  her  colleagues  [54,  55], 
concerns  the  distinction  between  parallel  and  serial  self-terminating  search.  The  experimental 
paradigm  motivating  this  distinction  examines  the  time  required  to  determine  whether  or  not 
an  object  with  specified  properties  exists  somewhere  in  an  artificial  scene.  The  results  depend 
on  the  nature  of  the  property  and  also  on  what  objects  are  found  in  the  scene  (figure  3).  Tasks 
varying  on  these  dimensions  segregate  strongly  into  two  classes.  In  the  first  class,  the  time 
required  is  independent  of  what  is  in  the  scene.  In  the  second  class,  time  is  a  linear  functior 
of  the  number  of  “distractor”  objects  in  the  scene,  and  on  average  is  twice  as  long  in  cases  in 
which  the  object  to  be  found  is  not  present  than  when  it  is  present.  Treisman  interprets  the 
first  wass  as  indicating  that  certain  properties  are  computed  in  parallel  and  in  constant  time 
over  the  entire  visual  field.  In  cases  in  which  the  desired  property  is  one  of  those  computed 
this  way,  determining  whether  or  not  any  object  with  the  property  is  present  can  be  computed 
in  constant  time  as  a  global  OR  over  the  resulting  retinotopic  map.  The  object,  if  present,  is 
said  to  “pop  out”  of  the  display,  and  such  properties  are  called  pop  out  properties.  Treisman 
interprets  the  second  class  as  indicating  that,  in  cases  where  properties  are  not  computed  in 
parallel,  visual  attention  must  be  applied  sequentially  to  each  location  in  the  field  to  determine 
whether  or  not  it  has  the  desired  property.  In  these  cases  if  a  single  object  of  the  desired  type 
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Figure  3:  Psychophysical  displays  requiring,  respectively,  parallel  and  serial  search.  Determin¬ 
ing  whether  or  not  there  is  a  horizontal  line  or  whether  or  not  there  i6  a  dashed  line  in  a  display 
such  as  the  first  one  takes  time  independent  of  the  number  of  objects.  Determining  whether 
there  is  a  horizontal  dashed  line  among  vertical  dashed  and  horizontal  solid  lines  (as  in  the 
second  display)  requires  serial  search  and  takes  time  linear  in  the  number  of  objects. 

is  present,  on  average  half  the  objects  in  the  field  will  be  examined  before  it  is  found;  if  one  is 
not  present,  every  object  in  the  field  must  be  examined.  This  “serial  self-terminating  search” 
accounts  neatly  for  the  reaction  time  data. 

Given  this  paradigm,  we  can  ask  what  features  pop  out.  Treisman  and  Gormican  [55] 
report  that  colors,  grey  level,  line  curvature,  line  orientation,  line  length,  line  ends,  directions 
of  motion,  stereoscopic  depth  differences,  and  the  proximity  and  numerosity  of  clusters  of  lines 
are  pop  out  properties.  These  results  are  particularly  interesting  because  there  is  convergent 
neurophysiological  evidence  for  early  retinotopic  representations  of  many  of  these  properties 
[30,  31].  On  the  other  hand,  intersection,  line  juncture,  angle,  connectedness,  containment,  and 
aspect  ratio  are  not  pop  out  properties.  Neither  are  conjunctions  of  pop  out  properties. 

Treisman ’s  results  have  been  replicated  by  many  other  researchers.  Recently,  some  conflict¬ 
ing  data  and  alternative  explanations  have  been  put  forth  [35, 57, 62].  I  have  adopted  Treisman ’s 
model  as  it  is  the  most  generally  accepted;  new  empirical  results  may  force  modifications.6 

3.2  An  architecture  for  visual  search 

These  psychophysical  results  suggest  an  architecture  like  that  of  figure  4.  Early  modalities 
compute  retinotopic  maps  bottom  up.7  Let  us  say  that  an  early  property  consists  of  a  dimension 

*  For  example,  the  reenlU  of  Wolfe  et  at.  [62]  suggest  that  the  activation  mapa  described  in  the  next  aection 
should  be  continuously  graded,  rather  than  Unary. 

TThia  is  an  abstraction  from  neuroscientific  results,  which  show  that  retinotopic  maps  are  actually  computed 
in  a  cascade  of  stages  [31].  The  work  of  Moran  and  Desimone  [32]  suggests  further  that  these  stages  are  probably 
interwoven  with  the  attentional  pyramid:  they  found  increasing  effects  of  visual  attention  on  the  receptives  fields 
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Figure  4:  An  architecture  for  visual  search.  In  this  example  the  retina  is  presented  with 
lines  varying  in  orientation  (horizontal,  vertical)  and  color  (symbolized  by  solid  and  dashed). 
Two  early  map6  compute  these  properties.  Activation  maps  compute  whether  a  desired  value 
(vertical  or  dashed,  input  from  the  left)  is  present  in  the  corresponding  early  maps  at  each 
point.  A  global  OR  (output  on  the  right)  supports  parallel  search.  The  addressing  pyramid 
supports  serial  search,  routing  to  the  root  (and  thereby  combining)  all  the  early  properties 
corresponding  to  a  particular  addressed  location  (the  lower  left  in  this  case).  I  have  omitted 
most  lines  connecting  to  pyramid  leaf  nodes  to  reduce  visual  clutter. 
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Figure  5:  Structure  of  the  activation  maps.  At  each  point  the  early  value  is  compared  with  the 
desired  value  to  give  a  boolean  activation  value.  A  global  OR  of  activation  values  is  computed 
over  the  entire  map. 

(which  is  a  particular  early  modality)  and  a  value  on  that  dimension.  Thus  color  is  a  dimension 
and  red  is  a  value.  Each  retinotopic  map  is  retinotopically  connected  to  an  activation  map. 
An  activation  map  acts  as  a  value  filter;  it  has  binary  dements  which  are  “on”  at  points 
where  corresponding  elements  in  the  early  map  have  the  desired  value.8  A  network  extending 
globally  over  the  activation  map  distributes  the  desired  value  to  all  the  activation  elements. 
(An  alternative  implementation  would  use  a  separate  activation  map  for  each  early  value.  This 
would  correspond  to  value  unit  encoding ,  which  seems  to  be  the  rule  for  cortical  neurons  [3].) 
Another  global  network  computes  the  global  OR  (figure  5).9 

of  neurons  in  successively  liter  utu  of  the  visual  cortex. 

‘Activation  map*  are  not  part  of  Treis  man's  original  model,  but  aeem  necessary  to  avoid  Marching  blank 
area*.  Similar  mechaniam*  have  been  propoMd  previously  [24,  62]. 

‘Alternatively,  as  TVeisman  and  Gormican  [55]  have  suggested,  the  global  OR  could  be  computed  using  the 
addressing  pyramid  by  adjusting  the  diameter  of  the  attended  area  to  span  the  entire  visual  field. 
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This  much  machinery  is  sufficient  for  tasks  that  require  deciding  whether  or  not  there  is  a 
location  in  a  scene  which  has  a  particular  early  property,  and  therefore  accounts  for  parallel 
visual  search.  What  about  combinations  of  early  properties?  A  straightforward  solution  would 
be  to  provide  activation  maps  for  all  possible  combinations.  There’s  a  good  engineering  reason 
not  to  do  this:  there  are  too  many  combinations.  Assuming  that  there  are  a  dozen  early 
maps,  there  would  be  213  =  4096  combination  maps.  (Value  unit  encoding  of  the  individual 
activation  maps  would  increase  the  exponent  substantially.)  Since  retinotopic  maps  each  take 
up  a  significant  chunk  of  cortex  [31],  this  is  infeasible.  An  alternative  to  this  proliferation  of 
maps  is  serial  application  of  visual  attention,  as  proposed  by  Treisman  and  others. 

Serial  visual  search  requires  enumerating  candidates  and  testing  to  see  if  they  have  the 
desired  property.  This  enumeration  must  be  performed  under  the  direction  of  some  external 
system,  which  I  will  refer  to  as  the  control  system  and  whose  internal  structure  is  outside  of 
the  scope  of  this  paper.  Various  enumeration  schemes  are  possible;  I  will  propose  a  simple 
one  which  matches  psychophysical  results.  To  enumerate  candidates,  you  pick  one  of  the  early 
dimensions  involved  in  the  compound  desired  property  and  enumerate  all  the  locations  that 
have  the  desired  value  on  that  dimension.  For  instance,  if  you  are  looking  for  a  vertical  blue 
edge,  you  can  enumerate  all  the  vertical  locations  and  check  if  they  are  blue  or  enumerate  all 
the  blue  locations  and  check  if  they  are  vertical.  Enumeration  involves  repeated  application 
of  two  primitives,  content  addressing  and  return  inhibition ,  which  affect  the  way  nodes  in 
the  addressing  pyramid  select  among  their  subnodes.  In  the  remainder  of  this  section  I  will 
describe  content  addressing,  candidate  testing,  and  return  inhibition,  and  show  how  they  can 
be  combined  into  an  algorithm  for  visual  search. 

In  content  addressing,  the  control  system  specifies  an  early  dimension  and  value,  and  the 
addressing  pyramid  routes  the  early  properties  of  an  arbitrary  location  with  that  value  on 
that  dimension  to  the  root.  This  is  accomplished  via  the  activation  maps:  the  control  system 
supplies  the  desired  value  to  the  relevant  activation  map  and  specifies  that  activation  map  as  the 
relevant  one  for  content  addressing.  Pyramid  leaf  nodes  disqualify  themselves  from  the  selection 
process  if  the  corresponding  activation  map  value  is  zero.  Disqualification  propagates  upwards; 
an  interna]  node  is  disqualified  if  all  its  subnodes  are  disqualified.  This  system  guarantees  that 
the  early  values  routed  to  the  root  node  correspond  to  a  location  in  the  image  whose  activation 
value  is  one,  and  thus  which  has  the  desired  early  value  on  the  specified  early  dimension.  If 
there  is  no  activated  element  in  the  specified  activation  map,  the  root  node  itself  is  disqualified; 
this  corresponds  to  search  failure. 

For  purposes  of  visual  search,  the  root  of  the  addressing  pyramid  is  connected  to  circuits 
which  determine  whether  the  early  properties  presented  there  are  the  desired  ones.  In  the  worst 
case,  this  would  require  a  circuit  for  each  of  the  few  thousand  possible  combinations  of  features. 
These  circuits,  operating  on  compact  encodings  rather  than  retinotopic  representations,  would 
easily  fit  into  a  small  chunk  of  the  brain.  Having  just  one  copy  of  these  circuits,  rather  than  a 
copy  at  each  retinotopic  location,  is  a  tremendous  hardware  savings.  Further  hardware  savings 
can  be  realized  by  computing  only  those  combinations  of  early  properties  that  are  actually  of 
interest. 

Suppose  the  currently  attended-to  location  does  not  have  the  desired  properties;  we  must 
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reject  it  and  find  another.  Return  inhibition ,  when  applied,  prevents  the  currently  addressed 
location  from  being  considered  a  candidate  in  future  content  addressing.  This  allows  candidates 
to  be  enumerated  uniquely.  Return  inhibition  requires  that  each  leaf  node  in  the  pyramid 
keep  a  state  bit  which  says  whether  or  not  it  h as  been  inhibited;  i  hibited  nodes  disqualify 
themselves.  Klein  [23]  and  Posner  et  al.  [41]  present  psychophysical  evidence  for  the  reality  of 
return  inhibition. 

In  summary,  an  algorithm  for  serial  self-terminating  search  in  the  proposed  architecture 
goes  as  follows.10 

1.  Pick  one  of  the  conjoined  early  properties.  Set  the  activation  map  for  this  property's 
dimension  to  filter  for  this  value. 

2.  Use  content  addressing  to  find  an  activated  location  in  the  image.  If  there  is  none,  return, 
signalling  failure.  Otherwise,  the  addressing  pyramid  will  map  the  early  properties  of  the 
found  location  to  the  root. 

3.  Check  whether  the  addressed  location  has  the  desired  combination  of  early  features.  If 
so,  return,  signalling  success. 

4.  Otherwise,  inhibit  return  to  the  currently  addressed  location.  This  means  that  future 
content  addressings  will  find  different  locations.  Go  to  step  2. 

Sonja  makes  extensive  use  of  this  algorithm  in  playing  Amazon. 

Treisman  and  Gelade  [54]  report  that  human  subjects  require  about  60ms  per  iteration  of 
the  address,  test,  inhibit  cycle.  This  corresponds  to  examining  seventeen  locations  per  second. 
SIVS’s  cycle  time  varies  because  it  was  implemented  on  a  serial  machine,  but  on  average  it 
examines  as  many  locations  per  second. 

3.3  Extensions 

This  section  presents  two  extensions  to  the  basic  visual  search  paradigm  which  proved  very 
useful  in  Sonja  but  which  are  only  weakly  supported  by  psychophysical  evidence.  The  first 
extension  allows  control  of  the  order  in  which  locations  are  enumerated;  the  second  allows 
attention  to  be  directed  to  locations  based  on  their  positions  in  the  image,  rather  than  on  their 
early  properties. 

Controlling  enumeration  order 

In  many  cases  it  is  useful  to  control  the  order  in  which  visual  search  enumerates  locations.  For 
example,  domain  knowledge  often  can  tell  you  roughly  where  the  sought  location  i6  likely  to 
be. 

Koch  and  Ullman  [24],  based  on  the  psychophysical  studies  of  Engel  [11,  12],  proposed  a 
proximity  preference  mechanism  for  their  model  of  the  attentional  pyramid.  Proximity  pref¬ 
erence  makes  the  location  selected  by  the  next  content  addressing  as  close  as  possible  to  the 

10T*ot*oe  present*  *  similar  algorithm  [58].  His  is  more  complicated  bec*u*e  it  involve*  *h*pe  matching. 


14 


currently  selected  location;  it  can  be  implemented  with  circuitry  that  enhances  the  activity  of 
units  close  to  the  selected  unit  in  the  winner-take-all  network. 

SIVS  provides  a  related  form  of  proximity  preference.  It  allows  the  control  system  to  choose 
an  arbitrary  point  of  interest  and  causes  content  addressing  to  proceed  in  increasing  order  of 
distance  from  this  point.  This  mechanism  could  be  implemented  using  a  damped  spreading 
activation  starting  from  the  chosen  point  and  enhancing  winner-take-all  units  in  proportion  to 
the  proximity  to  that  point.  SIVS’s  implementation  actually  uses  explicit  distance-comparison 
circuits.  The  chosen  point  is  specified  using  a  visual  marker  mechanism,  explained  in  section  5. 
SIVS  also  includes  a  mechanism  that  constrains  visual  search  to  a  specified  region  of  the  image 
or  to  locations  lying  along  a  specified  line.  In  the  latter  case,  locations  may  be  enumerated  in 
order  along  the  line. 

These  enumeration  order  extensions  to  visual  search  are  based  solely  on  efficiency  consid¬ 
erations.  The  only  relevant  psychophysical  evidence  I  know  of  is  due  to  Krose  and  Julesz  [26], 
who  show  that  proximity  preference  does  not  always  apply;  this  does  not  rule  out  it:  elective 
application  under  external  control.  It  would  be  easy  to  do  experiments  to  discover  whether  the 
human  visual  system  has  similar  mechanisms.  If  not,  people  must  do  exhaustive  searches  in 
situations  in  which  SIVS  does  not;  this  would  result  in  somewhat  different  attentional  perfor¬ 
mance. 

Pointer  addressing 

In  addition  to  content  addressing,  SIVS  supports  pointer  addressing.  In  pointer  mode,  the 
control  system  can  direct  the  pyramid  to  address  an  arbitrary  (x,y)  retinotopic  location.  This 
requires  passing  addresses  downward  through  the  pyramid  and  inhibiting  nodes  not  addressed. 
The  pyramid  can  also  pass  addresses  upward ,  providing  the  control  system  with  the  image 
coordinates  of  a  location  addressed  by  content. 

There  is  little  psychophysical  evidence  bearing  directly  on  the  question  of  whether  the 
human  visual  attention  apparatus  supports  pointer  addressing;  the  question  has  not  been  asked 
explicitly  before.  The  most  relevant  studies  ask  whether  or  not  people  can  direct  attention  to 
points  defined  indirectly,  for  example  as  “two  inches  to  the  left  of  the  big  X”.  Some  experiments 
have  been  done  along  these  lines,  but  the  results  are  inconclusive.  Krose  and  Julesz  [26]  present 
evidence  that  argues  against  such  addressing;  Posner  et  al.  [40]  and  Nakayama  and  Mackeben 
[36,  experiment  2]  present  evidence  that  argues  for  it.  Pylyshyn  [43]  argues  against  it  on  a 
priori  grounds. 

Whether  or  not  an  attentional  mechanism  supports  pointers  affects  possible  implementation 
strategies  for  other  sorts  of  visual  machinery.  For  example,  consider  the  problem  of  determining 
whether  one  location  in  an  image  is  to  the  left  of  another.  An  architecture  that  supports  pointers 
can  subtract  x  coordinates  to  answer  this  question;  an  architecture  without  pointers  must  do 
something  more  complicated. 
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3 A  Related  computational  work 

So  far  as  1  know,  SIVS  is  the  first  implemented  system  to  model  the  phenomena  described  in 
the  psychophysical  visual  search  literature  I  have  discussed. 

Several  other  researchers  have  presented  implementations  of  visual  attention.  These  imple¬ 
mentations  vary  in  their  motivations,  in  the  faithfulness  with  which  they  model  psychophysical 
results,  and  in  various  engineering  parameters.  Among  the  last  are  the  type  of  routing  network 
used  (retinotopic,  hierarchical,  or  all-points),  selective  enhancement  of  signals  from  attended 
locations  versus  selective  inhibition  of  signals  from  non-attended  locations,  and  whether  or  not 
regions  of  variable  diameter  can  be  addressed. 

Feldman  and  Ballard  (17)  provided  the  first  suggestion  I  have  found  for  a  computational 
implementation  of  covert  attention.  They  intended  both  to  model  Treisman  and  Gelade’s  [54] 
psychophysical  studies  and  to  solve  the  connectionist  crosstalk  problem.  They  suggested  using 
a  winner-take-all  network;  their  discussion  is  abstract  and  appr  ently  the  suggestion  was  not 
implemented.  Koch  and  Ullman  [24]  similarly  did  not  implement  their  proposed  pyramid. 

The  earliest  implementation  I  *..w?  found  is  due  to  Fukushima  [19].  He  used  a  hierarchical 
winner-take-all  network  of  connectionist  units.  His  implementation  seems  to  have  been  mo¬ 
tivated  principally  by  engineering  concerns;  it  does  not  try  to  model  psychophysical  results 
accurately.  For  example,  the  attended  subset  of  the  image  does  not  need  to  be  contiguous. 

Strong  and  Whitehead  [51]  present  an  implemented  model  of  overt  visual  attention  inspired 
by  Feldman  and  Ballard’s  work  and  similarly  intended  to  solve  the  crosstalk  problem. 

Mozer’s  [33,  34]  implementation  of  covert  attention  models  psychophysical  results  better 
than  Fukushima’s;  it  can  attend  only  to  a  single  contiguous  region.  His  winner-take-all  network 
is  not  implemented  hierarchically  (as  a  pyramid)  but  rather  retinotopically.  Koch  and  Ull¬ 
man  argue  that  a  hierarchical  organization  results  in  faster  convergence  of  the  winner-take-all 
computation  than  would  a  locally  connected  network  such  as  Mozer’s.  Mozer  implemented  ad¬ 
dressing  of  continuously  variable  diameter  regions;  SIVS  doesn’t.  Mozer’s  network  operates  by 
selective  enhancement  of  signals  from  the  attended  region,  rather  than  by  selective  inhibition  of 
signals  elsewhere  (as  does  Fukushima’s  implementation  and  SIVS).  Neurophysiological  results 
suggest  that  the  primate  attentionaJ  system  operates  by  selective  inhibition.11 

Ahmad  and  Omohundro  [1]  describe  an  implementation  with  a  contiguous,  variable  diameter 
spotlight  and  boolean  inhibition.  This  implementation  is  able  to  gate  signals  from  the  attended 
location  to  a  central  node  in  constant  time  by  connecting  the  central  unit  to  every  unit  in  a 
retinotopic  array.  This  seems  biologically  implausible;  SIVS  uses  a  log(n )  depth  fan-in  tree 
instead. 

11  Moran  and  Desimone  [32]  found  that  neurons  in  area  V4  of  the  visual  cortex  whose  receptive  fields  (RFs) 
include  the  attended  location  respond  strongly  to  stimuli  at  this  location  and  weakly  elsewhere  in  the  RF,  but 
neurons  whose  RFs  did  not  include  the  attended  location  responded  strongly  to  stimuli  anywhere  in  the  RF. 
Tsotsos  [58]  argues  that  selective  inhibition  should  make  the  winner-take-all  network  converge  more  quickly. 
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3.5  Open  questions 

Many  empirical  and  engineering  questions  concerning  this  architecture,  beyond  those  posed 
earlier  in  this  paper,  remain  open. 

Current  psychophysical  evidence  does  not  answer  many  questions  concerning  return  inhi¬ 
bition.  For  example,  is  it  applied  automatically  and  uniformly,  or  selectively  under  control  of 
the  control  system?  SIVS  allows  the  latter.  How  are  locations  uninhibited?  SIVS  provides 
a  global  inhibition  reset  line  which  uninhibits  all  locations,  providing  a  clean  slate  for  a  new 
search.  Perhaps  individual  locations  can  be  uninhibited,  or  perhaps  inhibition  just  decays  over 
time.  Is  there  a  limit  on  how  many  locations  can  be  inhibited?  What  is  the  spatial  resolution 
for  inhibition?  It  cannot  be  the  case  that  enormous  numbers  of  locations  can  be  inhibited  with 
great  precision,  or  it  would  be  easy  to  count  patterns  of  many  dots  in  arbitrary  order. 

Just  what  ore  the  "locations”  which  the  attentions!  spotlight  looks  at,  content  addressing 
finds,  and  return  inhibition  applies  to?  A  simple  hypothesis  is  that  locations  are  the  regions 
of  the  image  found  by  a  general-purpose  preattentive  segmentation  process  that  partitions  the 
image  into  relatively  homogeneous  regions.  There  is  much  psychophysical  evidence  that  at  least 
a  first-pass  segmentation  is  performed  bottom-up  [50,  53].  Psychophysical  studies  on  attention 
have  usually  controlled  out  segmentation  by  using  as  stimuli  displays  of  small  geometrical 
figures  neatly  separated  by  a  featureless  white  background,  so  little  is  known  about  interactions 
between  attention  and  segmentation.  More  study  of  this  interaction  would  be  valuable;  recent 
studies  by  Driver  and  Baylis  [10]  and  by  Farah  et  al  [15]  support  the  hypothesis  that  attended 
locations  are  preattentively  segmented  regions. 

4  Visual  routines 

Visual  routines  are  time-extended  patterns  of  visual  processing.  Many  visual  properties,  partic¬ 
ularly  topological  properties  such  as  connectedness  and  containment,  are  difficult  or  impossible 
to  compute  using  a  single  type  of  processing.  Different  sorts  of  processing  must  be  applied 
in  sequence.  Ullman’s  visual  routines  paper  [60]  proposes  that  patterns  of  visual  processing 
be  thought  of  as  programs,  or  routines,  whose  primitive  operations  are  parameterized  types  of 
visual  processing.  The  strength  of  this  idea  is  that  a  small  set  of  visual  primitives  can  be  re¬ 
combined  into  an  infinite  number  of  types  of  visual  processing.  The  demands  of  visual  analysis 
are  very  diverse;  yet  given  the  right  set  of  operations,  it  may  be  possible  to  assemble  visual 
routines  capable  of  performing  whatever  sort  of  visual  work  is  necessary  for  a  new  task.  A 
vision  system  can  thus  be  thought  of  as  a  sort  of  programming  language. 

As  an  example,  Ullman  describes  a  visual  routine  for  computing  whether  or  not  a  particular 
point  i6  contained  within  a  closed  curve  in  the  image.  This  routine  involves  applying  two 
primitive  operations.  First,  a  “wave  of  activation”  is  propagated  in  the  image,  starting  from 
the  point  of  interest,  and  expanding  in  parallel  in  all  directions,  but  stopping  when  a  boundary 
i6  reached.  (See  figure  6.)  This  computation  can  be  performed  efficiently  on  a  parallel  two- 
dimensional  grid  machine.  Second,  a  “point  at  infinity” — any  point  that  is  for  some  reason 
guaranteed  to  he  outside  any  curve — is  tested  to  see  whether  it  has  been  activated.  If  it  has, 
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Figure  6:  A  visual  routine  for  computing  containment.  Starting  from  a  point  marked  with  a 
solid  square  (first  picture),  a  wave  of  “activation”  is  spread  (second  picture).  The  wave  stops 
when  it  hits  a  boundary.  Then,  a  “point  at  infinity”  is  tested  to  see  whether  it  is  activated 
(third  picture).  In  this  case,  the  point  at  infinity  is  marked  with  another  square;  it  is  not 
activated,  and  so  the  first  square  must  have  been  inside  a  boundary. 

we  know  that  the  activation  has  “leaked  out”  of  any  surrounding  curve,  and  that  the  original 
point  is  not  in  fact  contained  in  a  closed  loop.  If  it  hasn’t,  we  know  that  there  is  a  containing 
curve. 

In  SIVS,  primitive  visual  operations  are  computed  by  visual  operators,  which  are  thought 
of  as  specific  pieces  of  neural  hardware.  Primitive  operations  and  operators  correspond  one-to- 
one:  although  it  is  logically  possible  that  individual  pieces  of  visual  hardware  could  compute 
several  distinct  operations,  the  proposed  primitive  operations  are  sufficiently  dissimilar  that  it 
seems  more  likely  that  functions  are  allocated  statically. 

In  the  most  genera]  case,  illustrated  in  figure  7,  a  visual  operator  takes  as  input  zero  or 
more  retinotopic  maps  and  zero  or  more  control  signals  which  determine  the  parameters  of  the 
operation  and  whether  or  not  it  is  actually  carried  out.  The  operator  encodes  its  results  on  zero 
or  more  compactly-encoded  output  buses.  The  operator  may  maintain  state,  typically  shared 
with  other  operators;  I  will  explain  the  form  of  this  state  in  section  5. 

Collectively,  the  set  of  visual  operators  constitute  a  visual  routines  processor  (VRP).  The 
operation  of  this  VRP  is  directed  by  an  external  control  system,  probably  the  same  one  in¬ 
volved  in  visual  search,  whose  nature  I  will  again  leave  unspecified.  The  interface  between 
the  VRP  and  the  control  system  consists  of  a  fixed  set  of  compact  buses,  with  queries  and 
commands  as  inputs  to  the  VRP  and  results  as  outputs.  This  organization  is  similar  to  that 
of  a  horizontally  microcoded  computer  (figure  8).  The  VRP  plays  the  role  of  the  datapaths; 
the  control  system  is  analogous  to  the  computer’s  control  logic.  As  in  some  recent  horizontally 
microcoded  architectures,  there  are  many  datapaths  all  of  which  operate  in  parallel  on  every 
cycle  (though  some  of  them  may  not  do  anything  useful,  if  there  is  no  computation  of  their  sort 
to  be  performed  on  that  cycle).  Thus  visual  routines  are  strictly  patterns  rather  than  simply 
sequences  of  visual  operations:  several  operations  may  occur  simultaneously.  On  each  tick  the 
control  system  computes  the  parameters  controlling  the  visual  operators;  this  is  analogous  to 
the  vector  of  control  bits  provided  by  a  horizontal  microinstruction. 

Depending  on  the  primitive  operations  selected,  on  task  requirements,  and  on  the  mech 
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Figure  7:  A  generic  visual  operator.  The  visual  op  rator,  on  the  basis  of  control  inputs,  produces 
compact  outputs  from  the  retinotopic  maps.  It  may  maintain  some  state,  which  can  be  shared 
with  other  visual  operators. 
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Figure  8:  Overall  modularity.  A  control  system  takes  inputs  from  the  visual  operators  and 
produces  outputs  for  them. 
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anisms  that  determine  when  operations  are  performed,  visual  routines  might  be  used  in  very 
different  ways.  For  Sonja,  visual  routines  are  a  means  for  to-k-specific,  top-down  control  of 
visual  processing  in  support  of  action.  ‘Task-specific"  means  that  visual  routines  are  designed 
to  discover  properties  of  domain  situations  that  are  meaningful  in  terms  of  the  the  task  the 
agent  is  engaged  in.  “Top-down"  means  that  visual  routines  are  invoked  on  the  basis  of  factors 
other  than  just  the  currently  visible  image,  such  as  the  memories  and  purposes  of  the  agent.  In 
such  a  model,  you  might  have  task-specific  routines  for  checking  your  speedometer,  for  glancing 
at  your  keyboard  to  align  your  fingers  in  home  position,  for  checking  a  pancake  to  see  if  it 
is  ready  to  flip,  and  for  finding  safe  footing  when  walking  in  the  mountains.  These  tasks  are 
subtasks  of  other  tasks,  which  are  not  purely  visual,  but  which  are  guided  by  visual  feedback. 
In  section  7  we  will  see  a  complex  example  of  a  task-specific  visual  routine  that  guides  activity 
in  Sonja. 

These  commitments  to  task-specificity,  top-down  control,  and  visually  guided  activity  are 
not  necessary  correlates  of  the  visual  routines  architecture.  Visual  routines  might  instead 
be  used  as  part  of  a  bottom-up  object  recognition  system,  for  example,  or  they  might  be 
applied  uniformly  and  might  deliver  purpose-independent  representations  as  outputs.  SIVS’s 
architecture  might  carry  over  directly  to  such  applications  although  that  is  not  the  way  I  have 
used  it. 

I  will  postpone  discussion  of  the  specific  operators  in  SIVS  until  section  6.2,  after  explaining 
the  data  structures  the  operators  use  (described  in  section  5)  and  the  criteria  used  in  choosing 
the  operators  (described  in  section  6.1).  However,  their  purpose,  in  general  terms,  is  spatial 
analysis:  establishing  both  geometrical  and  topological  relationships  between  portions  of  the 
image. 

5  Intermediate  objects 

We  saw  in  section  4  that  visual  operators  can  access  state  variables.  This  state  is  required  to 
keep  track  of  intermediate  results  during  visual  routines.  We  have  seen  one  example  already: 
the  activated  region  that  is  computed  in  the  first  step  of  the  containment  routine  (figure  6). 
Lacking  empirical  constraint  on  the  nature  ef  these  intermediate  representations,  I  have  adopted 
Ullman’s  proposals,  which  were  based  on  computational  intuition,  and  extended  them  based  on 
my  own  computational  intuition.12  These  proposals  might  be  tested  psychophysically. 

SIVS’s  visual  operators  manipulate  four  types  of  intermediate  objects:  markers ,  lines,  rays, 
and  activation  planes.  These  representations  are  shared  across  operators,  rather  than  being 
private  to  particular  ones.  Visual  markers  designate  locations  in  the  image.  Lines  (actually 
directed  line  segments)  run  between  two  points;  rays  extend  from  a  point  to  infinity.  Activation 
planes  represent  regions  in  the  image.  In  figure  17,  for  instance,  we  see  some  markers,  lines, 
rays,  and  activated  regions  displayed  graphically  on  top  of  a  running  Amazon  game.  The 
reverse-video  polygons  represent  markers,  the  drawn  lines  represent  line  and  ray  intermediate 
objects,  and  the  shaded  regions  represent  activation  planes.  (These  graphical  representations 

,5The  only  previous  im piemen tation  of  visual  routines,  due  to  Romanycia  [45],  used  only  retinotopic 
representations. 
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Figure  9:  To  determine  whether  there  are  four  colinear  points,  you  have  to  keep  track  of  points 
to  apply  the  colinearity  test  to. 

should  not  be  confused  with  the  intermediate  objects  themselves.)  All  four  sorts  of  intermediate 
objects  are  image-centered,  representing  only  two-dimensional  information.  Three-dimensional 
processing  is  beyond  the  scope  of  this  paper;  see  [7)  for  ways  to  incorporate  it. 

The  interface  between  the  control  system  and  the  VRP  is  in  part  in  terms  of  the  interme¬ 
diate  objects.  The  control  system  can  name  intermediate  objects;  that  is,  it  can  pass  compact 
encodings  which  say  which  marker,  line,  ray,  or  activation  plane  to  use  in  an  operation.  Oper¬ 
ators  can,  for  instance,  determine  whether  the  distance  between  one  pair  of  markers  is  greater 
or  less  than  the  distance  between  another  pair;  draw  a  line  between  two  markers;  or  determine 
whether  a  marker  is  within  an  activated  region. 

The  remainder  of  this  section  describes  in  turn  markers,  lines  and  rays,  and  activation 
planes. 

Markers 

Many  visual  operations  require  storing  locations.  For  example,  if  you  want  to  know  if  there 
are  four  colinear  points  in  an  image,  you  have  to  keep  track  of  points  to  apply  the  colinearity 
test  to  (figure  9).  Visual  markers  are  one  mechanism  for  keeping  track  of  locations.  The 
simplest  implementation  of  location  stores  would  be  registers  holding  image  coordinates.  Since 
stored  locations  are  typically  found  using  visual  search,  this  implementation  requires  that  the 
addressing  pyramid  be  'ible  to  pass  addresses  in  at  least  the  upward  direction. 

Visual  markers  are  not  intended  as  a  complete  theory  of  the  ability  to  store  locations. 
Markers  can  only  represent  visible  objects,  whereas  people  can  remember  the  locations  of  objects 
that  are  no  ionger  visible.  Psychological  evidence  has  led  several  researchers  [16,  42,  43]  to 
propose  that  there  are  several  distinct  spatial  memory  mechanisms;  markers  might  be  one. 


Lines  and  rays 

Lines  and  rays  in  S1VS  may  not  correspond  to  any  “things”  in  the  human  visual  system. 
They  were  an  easy  interface  for  various  useful  visual  operators  which  may  well  use  some  other 
interface.  Their  principal  use  is  to  specify  spatial  limits  for  a  search:  SIVS  has  operators 
that  find  things  that  lie  along  a  straight  line  or  a  ray.  Whether  such  limits  can  actually 
he  put  on  visual  search  is  unknown  but  could  readily  be  determined  experimentally.  One 
way  to  implement  them  would  be  to  “draw"  the  line  on  a  retinotopic  map  that  is  ANDed 
into  a  search  activation  map.  Then  only  locations  lying  on  the  line  would  be  candidates  for 
content  addressing.  Another  implementation  would  scan  attention  serially  along  a  line.  These 
implementations  could  be  distinguished  psychophysically  by  reaction  time  data. 

Activation  planes 

Activation  planes  are  used  to  keep  track  of  interesting  regions  of  the  image,  as  in  Ullman's 
routine  for  computing  containment.  They  can  be  naturally  implemented  as  retinotopic  bit 
arrays,  one  bit  array  per  plane;  bits  are  turned  on  at  points  that  are  within  the  region  of 
interest. 

Psychophysical  evidence  could  help  support  or  disconfirm  the  existence  of  activation  plane 
hardware.  I  know  of  only  one  relevant  study,  due  to  Farah  [14],  who  found  that  imaging  a 
complex  bounded  form  increased  the  detectability  of  events  within  the  bounded  region.  This 
effect  was  found  to  be  similar  to  attending  to  a  colored  form  of  the  ame  shape.  An  activation 
plane  would  be  a  natural  mechanism  underlying  this  effect.  Kosslyn  has  suggested  an  experi¬ 
ment  (described  by  Ullman  [60]  but  apparently  never  performed)  that  would  give  a  more  direct 
test.  In  it  facilitation  of  later  inside/outside  judgements  by  a  first  judgement  would  suggest  the 
existence  of  a  representation  of  the  extent  of  the  bounded  region. 

How  many  activation  planes  are  there?  I  know  of  no  psychophysical  studies  of  this  question. 
The  following  informal  observation  may  serve  as  the  basis  for  an  experiment.  When  staring 
at  floors  tiled  with  a  regular  pattern  of  identical  tiles,  I  find  that  I  can  cause  specific  subsets 
to  jump  out:  a  hollow  or  filled  square  or  hexagon  (depending  on  the  tiling  pattern)  or,  more 
interestingly,  disconnected  sets  like  alternate  tiles  (resulting  in  a  checkerboard  appearance). 
This  phenomenon  is  quite  striking  in  that  I  can  make  quite  elaborate  patterns  appear  globally 
over  a  space  of  hundreds  of  tiles.  The  jumping-out  tiles  almost  literally  appear  to  be  darker  or 
“colored.”  If  this  is  the  phenomenological  correlate  of  activation,  it  suggests  that  there  is  only 
one  activation  plane  available  for  this  purpose,  because  despite  much  effort,  I  am  completely 
unable  to  form  even  simple  patterns  that  divide  the  surface  into  three  sets  rather  than  two. 
Simple  psychophysical  experiments  might  decide  this  question.  In  any  case,  Sonja  uses  three 
activation  planes,  but  could  probably  get  by  with  timesharing  a  single  one. 

The  next  section  explains  how  intermediate  objects  are  manipulated  by  visual  operators. 
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6  Visual  operators 

Visual  operators  are  to  be  thought  of  as  bits  of  hardware  each  dedicated  to  performing  a 
specific  visual  operation.  Because  visual  operators  support  intermediate  vision,  their  purpose 
is  to  give  compact  answers  to  compact  questions  about  non-compact  representations  such  as 
image  regions  and  activation  planes.  (Recall  figure  7.)  Because  the  range  of  visual  processing 
tasks  is  so  broad,  there  are  many  visual  operators  with  distinct  functions.  Individually  they 
may  not  do  much,  but  they  may  be  combined  by  serial  application  into  powerful  visual  routines. 
These  routines  involve  partial  results  which  are  kept  track  of  with  intermediate  objects. 

Section  6.1  explains  how  we  might  choose  and  evaluate  a  set  of  visual  operators;  section  6.2 
describes  the  specific  set  implemented  in  SIVS.  Section  7  provides  examples  of  the  combination 
of  these  operators  into  visual  routines. 

6.1  Criteria  on  visual  operators 

There  are  two  sorts  of  criteria  that  bear  or.  choosing  perceptual  operators:  local  criteria  on 
individual  operators  and  global  criteria  on  the  set  of  them. 

The  local  criteria  I  used  in  designing  SIVS  were  that  an  operator  be  implementable  in  biolog¬ 
ically  plausible  hardware,  general  purpose,  neurophysiologically  and  psychophysically  plausible, 
and  clean  from  an  engineering  standpoint. 

•  Each  operator  ought  to  be  implementable  in  biologically  plausible  hardware.  The  local 
connectivity  and  slow  processing  speed  of  neural  hardware  imply  that  visual  computations 
(which  typically  operate  in  less  than  a  second)  must  involve  no  more  than  about  a  hundred 
sequential  steps  [17]  and  make  certain  mechanisms  such  as  pointers  expensive — perhaps 
prohibitively  so.  Ideally  SIVS  would  implement  each  operator  as  a  network  of  neuron-like 
units.  However,  even  the  most  powerful  parallel  computers  available  today  would  not 
have  been  up  to  the  job  of  simulating  the  number  of  units  required,  and  so  I  implemented 
most  operators  with  conventional  serial  algorithms.  However,  I  will  sketch  massively 
parallel,  pointer-free  implementations  for  each  visual  operator,  thereby  arguing  that  the 
constraints  of  biological  plausibility  do  not  rule  them  out  a  priori. 

•  The  operators  should  be  general  purpose  in  two  senses:  they  should  not  depend  on  the 
specific  domain  in  which  the  system  is  tested,  and  they  should  be  useful  for  very  different 
sorts  of  tasks.  It  is  impossible  to  be  sure  without  doing  cross-domain  and  cross-task 
studies,  but  my  intuition  is  that  all  SIVS’s  operators  satisfy  this  criterion. 

•  Each  operator’s  membership  in  the  set  ought  to  be  supported  by  neurophysiological  and 
psychophysical  evidence.  This  is  not  true  in  SIVS.  Most  of  the  relevant  experiments 
have  not  been  done.  Many  of  the  operators  in  SIVS  suggest  tests  for  comparable  human 
performance. 

•  Finally,  I  used  straightforward  engineering  considerations  to  choose  many  of  the  operators. 
I  used  programmer’s  intuition  to  judge  whether  the  postulated  operators  involved  seemed 
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dean.  SIVS  includes  only  one  inelegant  operator  (explained  in  section  7),  in  a  case  in 
which  "doing  the  right  thing”  seemed  like  it  would  be  a  lot  of  work  and  not  particularly 
edifying.  I  am  reasonably  sure  that  this  operator  does  nothing  that  could  not  be  done 
cleanly. 

My  global  criteria  on  the  set  of  operators  were  that  the  set  span  the  space  of  visual  analysis 
tasks;  that  it  make  programming  visual  routines  easy;  and  that  it  make  learning  visual  routines 
easy.  I  will  postpone  evaluating  SIVS  according  to  these  criteria  until  section  8.2. 

•  We  want  a  "spanning”  set:  that  is,  a  set  of  operators  that  together  are  sufficient  for  any 
task.  Here  "any  task”  may  mean  "any  psychologically  plausible  task”  or,  for  engineers, 
"any  task  in  the  class  of  domains  of  interest.”  Thus  the  set  of  visual  operators,  when  com¬ 
bined  into  routines,  are  to  form  a  finite  means  for  the  realization  of  an  infinite  collection 
of  possible  visual  processes. 

•  A  set  of  operators  should  not  only  make  it  possible  to  implement  any  visual  task,  it 
ought  to  make  it  easy.  From  an  engineering  standpoint,  the  VRP  should  present  a  nice 
programming  system. 

•  A  set  of  operators  should  also  make  it  easy  to  learn  new  visual  routines. 

0.2  SIVS’s  visual  operators 

This  section  describes  the  specific  visual  operators  SIVS  uses.  This  set  substantially  extends 
the  6et  proposed  by  Ullman  [60]. 

The  subsections  of  this  section  correspond  to  a  somewhat  arbitrary  categorization  of  op¬ 
erators  into  six  groups.  The  first  group  is  concerned  with  visual  attention  and  search;  the 
corresponding  subsection,  6.2.1,  fills  in  some  details  left  vague  in  sections  2  and  3.  Subsection 
6.2.2  describes  an  operator  for  tracking  moving  objects.  6.2.3  describes  operators  concerned 
with  geometry:  distances,  directions,  and  angles.  Subsection  6.2.4  describes  operators  for  di¬ 
rect  manipulation  of  intermediate  objects  and  6.2.5  describes  operators  involving  activation 
planes;  together  these  can  be  used  to  determine  topological  properties  such  as  containment  and 
connectedness.  Subsection  6.2.6,  finally,  describes  a  housekeeping  operation  called  “blanking.” 

By  convention,  the  names  of  operators  that  change  the  state  of  intermediate  objects  end 
in  an  exclamation  mark  (coat«&t-addr«ss!),  and  those  that  implement  boolean  tests  end  in  a 
question  mark  (distanct-vithin?). 

6.2.1  Visual  attention  and  search 

I  have  explained  the  implementation  of  visual  search  in  section  3;  this  section  explains  the 
details  of  the  interface  between  it  and  the  control  system.  This  interface  is  in  terms  of  visual 
operators  which  encapsulate  the  addressing  pyramid,  thereby  integrating  it  into  the  architecture 
of  figure  8. 

Attention  supplies  early  properties  to  the  control  system.  I  gave  the  VRP  as  plausible  an 
interface  with  the  control  system  as  I  could,  but  made  no  attempt  to  make  the  connection 
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between  early  and  intermediate  vision  realistic.  SIVS  does  no  pixel-wise  early  processing. 
Instead,  the  visual  operators  have  direct  access  to  the  data  structures  representing  video  game 
objects  that  Amazon  maintains  for  its  own  purposes.  I  will  talk  about  the  implications  of 
bypassing  early  vision  in  section  8.1. 

Thus,  to  implement  content  addressing,  I  needed  a  simulated  implementation  of  early  pro¬ 
cessing.  This  implementation  is  in  terms  of  seven  early  dimensions.  I  chose  these  dimensions 
and  the  values  assigned  to  video  game  icons  along  these  dimensions  somewhat  arbitrarily;  my 
main  concern  was  to  ensure  that  some  objects  would  p6p  out  and  that  others  would  require  slow 
serial  searches.  Some  of  the  dimensions  are  found  in  human  early  processing:  grey  level,  speed 
and  direction  of  motion,  line  orientation,  and  perhaps  overall  size.  Two  others  are  arbitrary 
and  probably  not  biologically  accurate:  I  called  them  “fiddliness,”  corresponding  roughly  to 
the  amount  of  detail  in  the  icon,  and  “boxiness,”  corresponding  to  whether  or  not  the  icon  is 
roughly  rectangular.  Early  properties  are  not  computed  at  run  time,  but  axe  fixed  properties  of 
icons.  The  implementation  does  not  model  the  internal  structure  of  the  icons;  they  are  treated 
as  homogeneous  blobs.  SIVS  takes  icons  to  be  the  “locations’*  to  which  return  inhibition  applies; 
see  section  3.5. 

The  operator  content-addrese !  implements  content  addressing.  It  takes  as  inputs  an  enable 
signal  and  an  early  (dimension,  value)  pair  to  find.  Because  one  typically  wants  to  keep  track 
of  the  addressed  location,  the  operator  takes  a  marker  as  an  additional  input;  the  marker  is 
moved  to  the  center  of  the  addressed  location.  Optionally  content-addrees !  can  take  as  input 
another  marker  representing  the  locus  of  proximity  preference  (section  3.3). 

Three  operators  put  spatial  limits  on  content  addressing,  coatent-addrees-activated!  re¬ 
quires  that  the  found  location  be  within  an  activated  region,  sean-along-line !  and  scan-along- 
ray!  address  the  first  location  with  a  given  early  property  found  along  a  line  or  ray. 

The  operators  inhibit-retura!  and  uninhibit-retura!  perform  the  operations  they  are 
named  for  and  take  only  an  enable  signal  as  input. 

6.2.2  Tracking 

An  ability  to  track  several  moving  objects  simultaneously  is  very  useful  in  playing  video  games. 
My  informal  observations  of  human  players  suggest  to  me  that  people  can  to  track  up  to  four  or 
five  moving  things  simultaneously;  Pylyshyn  and  Storm  [43, 44]  present  psychophysical  evidence 
for  about  the  same  numerical  limit  on  tracking.13 

SIVS  tracks  moving  objects  with  visual  markers.  The  operator  track!  causes  a  marker, 
passed  as  an  input,  to  track  the  motion  of  the  thing  it  marks.  As  many  copies  of  track!  are 
required  as  things  can  be  tracked  simultaneously.  SIVS  supplies  five  copies,  of  which  Sonja  uses 
only  three.  SIVS  also  provides  disappeared?  operators  which  tell  whether  a  tracked  thing  has 
been  lo6t. 

nPylyshyn  argues  that  this  ability  contradicts  the  finding  that  people  can  attend  to  only  one  location  at  a  time. 
No  such  contradiction  is  necessary,  however,  if  attention  is  required  to  support  access  to  all  early  properties.  Full 
access  is  not  necessary  for  tracking,  which  needs  access  only  to  motion  computations.  The  tracking  hardware 
probably  has  a  separate,  dedicated  ad  .cessing  scheme  for  a  motion  map. 
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Figure  10:  Major  and  minor  components  of  directions  to  the  circled  point.  The  major  compo¬ 
nent  is  drawn  as  a  solid  arrow  and  the  minor  component  as  a  dashed  one.  In  two  cases  (a  and 
b)  the  two  components  coincide. 

In  the  human  visual  system,  tracking  presumably  works  by  segmenting  the  optical  flow 
field.  How  this  works  in  detail  is  unclear  (see  Thompson  and  Pong  [52]  for  an  explanation  of 
the  issues  and  some  computational  approaches)  but  there  is  no  doubt  that  the  human  visual 
system  supports  tracking  as  a  primitive  [46]. 

6.2.3  Distances,  directions,  and  angles 

SIVS  provides  a  series  of  operators  which  compute  geometrical  relationships  between  intermedi¬ 
ate  objects.  These  operators  manipulate  distances,  directions,  and  angles.  All  of  these  operators 
are  implemented  as  numerical  computations  in  terms  of  the  pixel  coordinates  of  intermediate 
objects.  Directions  are  coded  as  ordered  pairs  of  the  eight  “king’s  move”  directions,  a  major 
and  minor  component  representing  the  nearest  and  next  nearest  of  the  eight  directions  to  the 
actual  direction.  (See  figure  10.)  This  encoding  provides  sufficient  resolution  for  the  purposes 
to  which  the  system  is  put. 

The  operator  diatance-aithia?  is  a  predicate  on  two  markers  and  a  distance;  it  tests  whether 
the  distance  between  the  two  markers  is  greater  or  less  than  that  given.  greater-diatance? 
takes  two  pairs  of  markers  and  tells  whether  the  distance  between  the  first  pair  is  more  or  less 
than  the  distance  between  the  second,  aarkers-co incident?  tells  whether  the  distance  between 
two  markers  is  zero. 

aarker-to-aarker-direction  tells  the  direction  from  one  marker  to  another,  aligned?  tells 
whether  or  not  two  markers  are  aligned  in  one  of  the  eight  directions. 

Three  operators,  angle-cce?,  aarker-line-angle-cc«?,  markar-ray-angle-cc a?  determine 
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whether  or  not  an  angle  is  counterclockwise;  they  respectively  take  three  markers,  a  marker 
and  a  line,  and  a  marker  and  a  ray  as  inputs. 

Two  operators  determine  whether  locations  (interpreted  as  Amazon  icons)  are  adjacent  to 
each  other,  touching?  takes  as  inputs  a  marker  and  a  direction;  it  tells  whether  or  not  the 
Amazon  icon  under  the  marker  is  touching  something  else  on  the  specified  side,  or  whether 
there  is  free  space  in  that  direction.  cora«r-fr«o?  similarly  determines  whether  there  is  free 
space  adjacent  to  a  given  corner  of  an  icon.  These  operators  probably  ought  to  be  decomposed 
into  routines  over  more  primitive  operators  that  would  shift  attention  to  the  indicated  edge  of 
the  icon  and  check  for  free  space. 

0.2.4  Marker,  line,  and  ray  manipulation 

Two  operators  provide  direct  manipulation  of  marker  positions.  warp-asorkwr !  moves  one  marker 
to  be  coincident  with  another,  valk-aarksr!  moves  a  marker  a  specified  distance  in  a  specified 
direction.  These  are  implemented  by  side-effects  to  the  coordinate  information  that  markers 
maintain.  Similarly,  draw-lin*!  takes  two  markers  and  a  line  and  causes  the  line  to  extend 
from  one  marker  to  the  other,  and  draw-ray !  extends  a  given  ray  from  a  given  marker  in  a 
pven  direction. 

6.2.5  Activation 

The  primary  use  of  activation  planes  is  to  fill  (activate)  a  bounded  region.  Ullman  [60]  suggests 
that  what  counts  as  a  boundary  may  be  task-specific  (and  thus  is  presumably  supplied  as  a 
parameter  by  the  control  system).  Lacking  evidence  about  the  nature  of  this  information  in  the 
human  visual  system,  I  had  the  operator  take  a  disjunction  of  icon  types  that  are  allowed  in  the 
activated  region.  Ullman  suggests  further  that  short  gaps  in  surrounding  edges  may  optionally 
count  as  boundary  segments  for  the  activated  region,  and  SIVS  provides  an  optional  gap  filling 
facility.14 

Region  filling  is  implemented  by  the  operator  activatw-connactad-ragion!,  which  takes  a 
marker  to  start  spreading  activation  from,  a  plane  to  mark  the  region  with,  and  the  type 
information  about  what  will  count  as  boundaries.  The  operator  returns  a  single  boolean  value, 
which  tells  whether  activation  succeeded  or  if  the  spreading  would,  rather,  continue  to  infinity. 

Activation  can  be  efficiently  implemented  in  a  locally- connected  retinotopic  array  of  proces¬ 
sors  [60].  You  turn  the  activated?  bit  on  in  the  processor  corresponding  to  the  marked  point 
from  which  activation  begins,  and  then  you  repeatedly  propagate  activation:  each  activated 
processor  tells  all  its  neighbors  that  it  is  activated;  if  they  are  boundary  points  they  do  noth¬ 
ing;  otherwise,  they  set  themselves  activated.  Repeat  until  no  new  processor  becomes  activated. 
Mahoney  [29]  and  Shafrir  [47]  describe  still  faster  divide-and-conquer  activation  algorithms. 

Three  operators,  *arkar-activatad?,  lina-activatad?,  and  ray-actiwatad?  determine  whe¬ 
ther  other  intermediate  objects  intersect  activated  regions.  Operations  like  these  could  be 

14  Psychophysical  study  of  gap  filling  in  edge  tracing,  postulated  as  a  related  operator,  is  reported  by  Jolicoeur 
et  at.  [21].  Ullman  [59]  suggests  that  gap  filling  is  a  ubiquitous  operation  in  early  vision  and  proposes  neural 
networks  for  the  operation. 
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implemented  by  connecting  corresponding  elements  in  the  retinotopic  arrays  representing  the 
various  sorts  of  intermediate  objects.  For  instance,  in  section  5  I  have  suggested  implementing 
lines  by  “drawing”  them  on  a  retinotopic  array  of  computational  elements;  lias-activated? 
could  be  implemented  by  having  these  elements  communicate  with  corresponding  elements  in 
the  activation  plane  to  determine  if  they  represent  a  point  that  is  both  activated  and  on  the 
line;  computing  a  global  logical  OR  over  the  array  will  yield  the  desired  result. 

SIVS  provides  several  other  operators  that  manipulate  activation  planes,  aark-csntroid! 
puts  a  marker  at  the  centroid  of  an  activated  region.  A  biologically  plausible  implementation 
might  use  a  spreading  activation  computation  [18].  convex?  .ells  whether  a  activated  region  is 
convex,  and  •xpand-to-convox-hull !  takes  two  activation  planes  and  makes  one  be  the  convex 
hull  of  the  other.  Computing  convex  hulls  is  probably  not  psychologically  realistic,  but  for  the 
purpose  I  put  it  to,  a  realistic  and  equally  useful  operation  would  be  to  compute  a  Gaussian 
blurring  of  a  region;  such  blurring  operations  have  been  demonstrated  neurophysiologically  in 
human  early  vision  [30]. 

transect-activation!  takes  two  activation  planes,  a  line,  and  a  direction.  It  sets  one  of  the 
activation  planes  to  be  the  subset  of  the  other  plane  that  is  on  the  side  of  the  line  indicated 
by  the  direction.  (See  figure  17  for  an  example.)  One  way  to  implement  transection  would  be 
in  terras  of  the  activation  propagation  algorithm  previously  described;  it  would  require  turning 
on  the  boundary?  bits  in  elements  corresponding  to  locations  along  the  line. 

6.2.6  Blanking 

Blanking  is  a  sort  of  housekeeping  operation.  A  blanked  intermediate  object  is  unused  and  has 
no  spatial  information  associated  with  it.  For  each  intermediate  object  there  is  an  operator 
which  tells  whether  or  not  it  is  blanked  and  one  that  blanks  it.  There  is  no  explicit  way 
to  unblank  an  object;  operators  that  side-effect  objects  unblank  them.  In  implementation, 
blankedness  is  just  a  bit  associated  with  each  intermediate  object. 

This  concludes  the  enumeration  of  SIVS’s  visual  operators.  It  is  not  hard  to  think  of  other 
operators  of  the  same  general  character  that  could  be  added  to  the  set.  (For  example,  SIVS 
does  not  currently  support  boolean  operations  on  activation  planes.)  This  suggests  that  the  set 
is  incomplete;  this  issue  is  taken  up  in  section  8.2.  However,  the  next  section  will  demonstrate 
that  SIVS’s  operators  are  adequate  to  support  complex  visually  guided  activity  in  at  least  one 
domain. 

7  Visually  guided  activity 

This  section  describes  the  use  of  SIVS  to  guide  action  in  Sonja.  Sonja  plays  a  competent 
beginner’s  game  of  Amazon.  Its  access  to  Amazon  is  only  via  SIVS  and  the  game’s  primitive 
actions.  Study  of  such  visually  guided  action  is  important  for  several  reasons: 

•  It  is  an  interesting  and  ubiquitous  phenomenon  in  its  own  right,  and  one  that  has  been 
relatively  little  studied,  at  least  within  AI. 


29 


•  The  support  of  practical  activity  is  a  main  function  of  vision,  and  one  which  seems  to 
have  different  requirements  from  (for  example)  object  recognition. 

•  Practical  use  of  SIVS  shows  that  the  intermediate  visual  mechanisms  described  in  this 
paper  actually  are  useful.  This  does  not  necessarily  demonstrate  that  these  mechanisms 
will  be  useful  or  sufficient  in  other  domains  or  that  they  are  biologically  accurate  models, 
but  it  does  suggest  that  they  are  worthy  of  further  empirical  and  computational  6tudy. 

•  By  coordinating  the  various  intermediate  visual  processes  integrated  in  SIVS,  Sonja 
demonstrates  that  combinatorial  power  of  simple  mechanisms. 

•  The  use  of  psychologically  realistic  vision  puts  significant  constraints  on  practical  reason¬ 
ing  and  representation.  For  example,  the  bandwidth  limitations  of  visual  attention  imply 
limitations  on  representation  in  reasoning  under  time  pressure,  and  thereby  rule  out  some 
but  not  all  popular  approaches  to  reasoning  about  action  [6]. 

In  Sonja,  the  “control  system”  that  supplies  parameters  to  visual  operators  is  the  same  as 
that  responsible  for  reasoning  about  action;  it  is  described  in  [7].  Because  this  system  is  beyond 
the  scope  of  this  paper,  I  will  devote  this  section  to  presenting  an  example  of  Sonja  in  operation. 
This  example  will  give  a  sense  of  the  ways  in  which  many  different  sorts  of  visual  operations, 
including  search,  can  be  combined  effectively  to  guide  action;  it  should  not  be  hard  to  see  how 
to  make  the  relevant  control  decisions.  What  matters  is  that  the  control  mechanisms  base 
their  choices  on  visual  information  derived  from  visual  routines  using  the  operators  described 
in  section  6.2.  Sonja’s  cycle  time  is  well  under  a  second,  engendering  tight  coupling  between 
perception  and  action. 

Before  more  concrete  discussion,  I  must  explain  the  relevant  aspects  of  Sonja’s  domain, 
Amazon.  I  chose  a  video  game  as  a  domain  in  part  because  it  allowed  me  to  finesse  both 
early  and  late  vision  in  SIVS.  Doing  so  is  dangerous,  of  course;  as  section  8.1  points  out,  it 
may  not  be  possible  to  connect  a  SIVS-like  intermediate  visual  system  with  real  early  and  late 
vision.  Amazon,  as  a  domain,  has  several  positive  qualities  from  point  of  view  of  vision  research, 
however. 

•  It  is  a  naturally  occurring  task  domain,  closely  patterned  on  a  popular  arcade  game. 
People  regularly  engage  in  the  same  task  Sonja  performs. 

•  The  task  is  intensely  visual.  Action  decisions  can  be  made  mainly  by  examining  the  cur¬ 
rent,  visually  accessible  situation,  without  needing  extensive  reference  to  past  situations 
or  elaborate  reasoning  about  hypothetical  future  worlds. 

•  The  scene  presented  on  the  game  screen  changes  frequently,  reflecting  a  dynamic  under¬ 
lying  domain,  and  affording  opportunities  for  research  on  visual  processing  in  the  face  of 
change. 

Amazon  i6  a  dungeons-and-dragons-like  game  based  on  the  commercial  game  Gauntlet. 
Figure  11  i6  a  screen  snapshot  of  the  Amazon  window  as  it  appears  to  the  player.  The  player 
(a  person  or  Sonja)  controls  an  icon  on  the  screen  representing  a  woman  warrior:  the  amazon. 
The  window  actually  provides  a  view  into  a  small  part  of  a  larger  underlying  dungeon  composed 
of  bricks  which  form  obstacles,  walls,  and  rooms.  The  window  tracks  the  amazon;  when  it  gets 


30 


Figure  11:  A  simple  Amazon  scene.  The  player  controls  the  amazon  icon,  which  in  the  scenario 
presented  in  this  section  must  navigate  around  the  obstacle  to  get  the  amulet. 
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close  to  the  edge  of  the  screen,  the  window  smoothly  moves  over  the  underlying  scene  to  keep 
it  within  bounds,  revealing  new  parts  of  the  dungeon  as  it  goes.  The  amazon  can  move  in 
the  eight  king’s-move  directions  only;  motion  is  continuous  (actually  a  pixel  at  a  time  and  fast 
enough  that  there  is  no  noticeable  flicker).  In  the  dungeon  there  are  various  sorts  of  enemies 
and  tods.  In  figure  11,  for  instance,  there  is  an  amulet  at  the  extreme  left  which,  when  picked 
up,  confers  magical  powers  on  the  amazon.  There  are  many  further  complexities  to  the  domain 
which  are  not  relevant  here. 

The  scenario  presented  in  this  section  will  demonstrate  only  a  small  fraction  of  Sonja’s 
abilities;  specifically,  one  of  several  routines  for  moving  the  amazon  about  in  the  dungeon. 
Sonja  plays  Amazon  from  the  same  perspective  a  human  does:  looking  at  the  screen,  which  is 
to  say  as  if  looking  down  on  the  dungeon  from  above,  not  from  the  point  of  view  of  the  amazon 
icon.  There  are  well-known  algorithms  for  navigating  in  mazes  seen  from  above;  depth-first 
search  is  an  obvious  one,  and  probably  one  can’t  do  much  better  in  the  general  case  of  complex 
and  deliberately  confusing  mazes.  I  have  not  adopted  such  a  solution.  Amazon’s  mazes  are 
simple  enough  that  it  is  visually  obvious  how  to  get  about  in  them;  search  would  be  overkill. 

It  seems  plausible  to  me  that  a  human  Amazon  player  instead  uses  visual  routines  whose 
job  it  is  to  determine  how  to  get  from  one  point  to  another.  Navigation  thus  depends  mainly 
on  continuous  visual  analysis  of  the  current  situation.  It  also  seems  plausible  that  there  are 
several  of  these  routines,  specific  for  different  sorts  of  situations.  For  example,  different  routines 
might  analyze  the  scene  in  terms  of  obstacles  to  avoid,  rooms  to  enter  or  exit,  or  passageways 
to  follow.  (Eye  tracking  studies  might  be  used  to  explore  this  hypothesis.) 

When  a  new  Amazon  game  begins,  SFVS  is  initialized,  blanking  all  the  intermediate  objects. 
The  first  order  of  business  is  to  find  the  amazon  using  the  visual  search  algorithm  of  section  3.2. 
Sonja  uses  cont*nt-addr«s !  and  inhibit-raturs!  to  enumerate  things,  such  as  the  amazon, 
whose  value  for  the  early  dimension  siza  is  ■•diva.  Sonja  checks  each  successively  addressed 
object  to  see  whether  it  has  the  conjunction  of  early  properties  that  are  criterial  for  amazonhood, 
namely  being  fiddly  and  having  diagonal  elements  in  addition  to  having  size  ■•diua.  Since 
relatively  few  objects  other  than  the  amazon  are  of  medium  size,  the  search  can  be  expected 
to  terminate  quickly.  (In  figure  11,  the  only  object  Sonja  might  examine  besides  the  amazon  is 
the  amulet.)  Sonja  then  permanently  allocates  a  marker  to  tracking  the  amazon,  using  a  copy 
of  the  track!  operator.  Sonja  also  draws  a  ray  from  the  amazon  in  the  direction  it  is  heading, 
using  drav-ray!.  Figure  12  shows  the  outcome  of  this  amazon- finding  routine. 

Once  Sonja  has  found  the  amazon,  it  looks  around  the  screen  to  find  interesting  objects. 
It  does  this  with  another  visual  search,  specifying  thi  early  dimension  gray-l«v«l  with  de¬ 
sired  value  non-zero,  thereby  enumerating  all  objects.  There’s  no  way  to  avoid  enumerating 
everything  if  you  want  to  be  able  to  find  interesting  things  of  arbitrary  types,  because  there  is 
no  single  early  property  that  interesting  things  share  that  is  not  also  shared  by  uninteresting 
objects  such  as  bits  of  waff.  Accordingly,  this  search  can  spend  many  ticks  enumerating  bits 
of  wall  before  finding  anything  interesting.  In  the  example  at  hand,  Sonja  eventually  finds  the 
amulet,  identifies  it  as  such  by  examining  its  early  properties,  and  drops  a  marker  on  it  (figure 
13). 
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Figure  12:  Finding  the  amazon.  Sonja  tracks  the  amazon  with  the  right-pointing  marker  and 
extends  a  ray  in  the  direction  the  amazon  is  heading.  Recall  that  the  inverse-video  graphics 
are  only  representations  of  the  intermediate  objects,  which  are  not  implemented  graphically. 
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Figure  13:  Sonja  finds  the  amulet  and  marks  it  with  the  diamond- shaped  marker. 
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To  get  the  amazon  to  a  goal  (such  as  the  amulet),  Sonja  first  draws  a  line  betwe*  n  their 
respective  markers  using  dra«-lin« ! .  Then  Sonja  uses  «ctn-along-lia«!  to  determine  whether 
this  line  intersects  anything  that  would  constitute  an  obstacle. 

If  there  were  no  obstacle,  the  amazon  could  head  directly  for  the  goal.  (Remember  that 
the  amazon  moves  continuously.)  Sonja  finds  the  direction  to  the  goal  using  ■ark«r-to-«ark*r- 
diraction  and  by  default  heads  the  amazon  in  its  major  component.  (The  major  and  minor 
components  of  directions  were  explained  in  section  6.2.3.) 

There  is  an  obstacle  to  getting  to  the  goal,  and  aean-along-lin*  t  drops  a  marker  on  it 
(figure  14).  Typically,  the  obstacle  is  a  largish  object;  simply  putting  a  marker  on  it  some¬ 
where  is  not  enough  to  know  how  to  pass  it.  In  order  to  discover  its  extent,  Sonja  uses 
activata-connactad-ragion ! ,  passing  it  the  obstacle  marker,  a  type  specification  for  obstacles, 
and  an  activation  plane.  This  activates  the  whole  obstacle  (figure  15). 

Having  found  an  obstacle  to  getting  to  the  goal,  Sonja  has  to  figure  out  whether  or  not  the 
obstacle  is  a  room  that  must  be  entered  or  exited  or  whether  it  can  simply  be  passed  (as  is  the 
case). 

Sonja’s  visual  routine  for  determining  whether  or  not  an  icon  is  in  a  room  is  similar  to 
the  abstract  containment  routine  proposed  by  Ullman  and  discussed  in  section  4.  It  uses 
activat«-conn*ct«d-r*gion !  to  spread  activation  out. yard  from  the  icon  until  it  runs  into  an 
obstacle  boundary.  The  operator  knows  to  skip  over  short  gaps  in  the  boundary;  in  this  case 
those  may  constitute  doorways  in  a  room.  In  the  scene  illustrated  in  the  figures,  had  the 
amulet  been  in  the  center  of  the  obstacle,  the  walls  would  have  acted  as  a  room  to  enter,  with 
the  doorway  at  the  bottom  right.  The  operator  activat*-conn«ct«d-r»gion!  fails  if  activation 
runs  off  the  edge  of  the  screen,  which  corresponds  to  the  “point  at  infinity"  of  section  4.  If 
activation  extends  off  the  screen,  the  goal  is  not  bounded  by  a  room,  or  at  any  rate  not  one 
that  is  currently  wholly  visible.  In  figure  15,  neither  the  amazon  nor  the  amulet  is  in  a  room, 
and  activate-connccted-ragion!  fails  for  both. 

If  the  amazon  is  in  a  room  and  the  goal  is  in  the  same  room,  or  if  neither  is  in  a  room,  then 
the  entering  and  exiting  code  is  not  applicable.  Responsibility  for  getting  to  the  destination 
rests  in  such  cases  with  code  for  passing  obstacles. 

The  passing  code  has  one  decision  to  make:  which  way  to  go  around  the  obstacle.  This 
decision  is  quite  complicated  in  general.  By  default,  the  best  way  around  is  the  shortest. 
However,  if  part  of  the  obstacle  is  offscreen,  the  apparently  shortest  way  around  may  not  be. 

To  discover  which  is  the  shortest  way  around  the  obstacle,  Sonja  finds  the  centroid  of  its 
convex  hull.  First  it  uses  «xprx  i-to-conv«-hull !  to  activate  the  convex  hull  of  the  obstacle. 
Then  it  uses  ■ark-centroid!  to  move  the  obstacle  marker  to  the  centroid  of  the  convex  hull 
(figure  16).  Sonja  can  then  get  an  idea  of  which  way  around  the  obstacle  is  shorter  by  examining 
the  sign  of  the  angle  between  the  goal,  the  amazon,  and  the  center  (using  angle-ccv?).  It  is 
usually  the  case  that  the  shortest  way  around  has  the  same  sign  as  this  angle.  In  the  figure,  for 
instance,  the  shorter  way  around  is  counterclockwise. 

Next  Sonja  needs  to  determine  whether  the  obstacle  extends  offscreen  in  the  direction  it 
would  pass  it  if  it  took  the  apparently  shortest  way  around.  If  so,  it  would  typically  be  better  to 
go  the  other  way,  because  the  obstacle  might  extend  arbitrarily  far  offscreen.  (Recall  that  the 
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Figure  14:  Marking  the  obstacle  by'  finding  the  first  thing  along  the  line  between  Sonja  and  the 
destination.  The  obstacle  marker  is  the  square  one. 
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Figure  15:  Activating  the  whole  obstacle. 
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Figt  16:  Marking  the  center  of  the  convex  hull  of  the  obstacle  to  find  the  shorter  way  around. 
The  shorter  way  is  given  by  the  sign  of  the  angle  between  the  destination,  amazon,  and  obstacle 
markers:  counterclockwise,  in  this  case. 
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screen  shows  only  part  of  a  much  larger  underlying  world.)  Sonja  determines  whether  the  ob¬ 
stacle  passes  off  screen  by  activating  the  portion  it  would  pass  around  and  determining  whether 
that  portion  touches  the  edge  of  the  screen.  Specifically,  Sonja  uses  trans«ct-activation!  to 
activate  the  portion  of  the  obstacle  that  is  on  the  appropriate  side  of  the  goal  line  (figure  17). 
Sonja  then  uses  an  operator  called  *ctivatio&>tonch*s-scr*«&-*dg«?  to  determine  if  the  ob¬ 
stacle  runs  off  the  screen  in  the  direction  it  hopes  to  pass.  This  is  the  “inelegant”  operator  1 
mentioned  in  section  6.1;  it  is  not  obvious  how  to  treat  the  edge  of  the  screen  given  that  SIVS 
does  not  truly  implement  early  vision. 

.  Now  Sonja  has  a  best  guess  as  to  which  way  to  go  around:  the  apparently  shortest  way 
unless  it  runs  off  the  screen.  To  pass  the  obstacle,  Sonja  first  sets  the  amazon’s  heading  to 
the  major  component  of  the  direction  to  the  goal:  left,  in  this  case.  This  makes  the  heading 
ray  pass  through  the  obstacle,  a  condition  Sonja  senses  with  ray-activat*d?.  Sonja  then  tries 
successively  more  indirect  candidate  directions  by  repeatedly  setting  the  amazon’s  heading  to 
be  45  degrees  from  its  current  heading,  rotated  in  the  direction  opposite  to  that  in  which  the 
amazon  will  pass  around  the  obstacle,  until  the  heading  ray  no  longer  intersects  the  activated 
region.  (See  figure  18.) 

Eventually  Sonja  will  have  gone  far  enough  that  it  can  turn  to  head  more  directly  towards 
the  goal.  Periodically  it  forces  the  amazon’s  heading  45  degrees  closer  to  the  goal  (the  opposite 
rotation  from  that  used  to  find  an  initial  valid  heading).  If  the  heading  ray  is  still  clear  of  the 
obstacle,  Sonja  tries  again,  rotating  until  it  has  gone  too  far  and  runs  back  into  the  obstacle 
(figure  19).  Then  Sonja  rotates  out  one  increment  so  that  the  ray  is  in  the  clear  again.  In  figure 
20  we  see  the  amazon  having  gone  far  enough  that  a  new  heading  is  valid. 


39 


Figure  17:  Activating  the  apparently  shorter  part  of  the  obstacle.  The  upper  portion  is  activated 
with  both  the  original  obstacle  plane  (diagonal  increasing  hatch  pattern)  and  another  (diagonal 
decreasing  hatch  pattern),  yielding  a  lozenge  pattern. 
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Figure  18:  Searching  for  a  heading.  Sonja  rotates  the  amazon,  and  the  heading  ray,  progressively 
clockwise,  until  the  ray  no  longer  intersects  the  obstacle.  In  this  case  Sonja  first  tries  heading 
left,  finds  that  doing  so  causes  the  ray  to  intersect  with  the  obstacle,  and  so  tries  heading 
diagonally  up  and  left,  which  also  fails.  Finally  it  tries  heading  up,  and  succeeds. 


41 


Figure  19:  Rotating  the  heading  ray  back 
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Figure  20:  The  amazon  has  gone  far  enough 


at  a  new  heading  will  work. 


8  Conclusions 


SIVS  is  a  first  attempt  at  an  integrated  intermediate  leve]  vision  architecture  and  so  necessarily 
cuts  some  corners.  Section  8.1  describes  some  of  these  cut  corners.  Section  8.2  evaluates 
SIVS  according  to  the  criteria  of  generality  and  usefulness  posed  in  section  6.1.  Section  8.3 
summarizes  the  successes  of  the  system.  Though  this  exploratory  investigation  raises  more 
questions  than  it  answers,  it  demonstrates  that  familiar  mechanisms  such  as  visual  search  and 
attention  can  be  made  practically  useful  in  a  computational  implementation. 

8.1  Outstanding  problems 

The  most  pressing  problems  in  both  the  model  and  implementation  presented  here  are  the  lack 
of  treatment  of  early  vision  and  of  object  recognition. 

SIVS  bypasses  all  of  the  standard  difficulties  with  early  vision,  such  as  noise  and  illumination 
variations.  This  raises  the  issue  of  whether  this  model  of  vision  can  be  extended  to  domains  in 
which  early  vision  is  harder.  Unfortunately,  other  research  on  intermediate  vision  has  similarly 
failed  to  address  this  issue.  The  relevant  psychophysical  studies  use  clean,  evenly  fit  displays 
with  highly  discriminable  stimuli.  Ullman's  visual  routines  paper  uses  as  examples  diagram 
interpretation  tasks  in  which  noise  issues  can  be  ignored,  and  this  tradition  has  been  continued 
by  other  researchers  in  the  area.  Noise  sensitivity  is  a  serious  issue  because  it  is  the  job  of 
the  visual  operators  to  reduce  noisy  retinotopic  arrays  to  compact  encodings,  carrying  boolean 
values  in  many  cases.  So  unlike  the  outputs  of  early  vision,  which  vary  continuously,  the 
outputs  of  visual  operators  may  be  very  wrong  if  they  are  not  exactly  right.  Thus,  they  had 
better  not  be  sensitive  to  noise.  Is  it  possible  to  implement  noise-insensitive  operators  that 
perform  operations  similar  to  those  of  SIVS?  This  is  an  open  question  on  which  the  plausibility 
of  the  model  rests.  In  current  research  I  am  connecting  SIVS  with  a  real-time  early  vision 
constructed  by  Nishihara  [37,  38]  in  order  to  support  a  robot  system. 

SIVS  does  not  address  the  hard  issues  in  object  recognition  because  Amazon  is  atypical  in 
that  objects  can  be  categorized  simply  by  combinations  of  early  properties.  The  number  of 
types  of  objects  (icons)  that  can  appear  on  the  screen  is  small,  and  problems  such  as  occlusion, 
the  rotational  instability  of  features,  and  non-rigid  motion  are  not  found  in  the  domain. 

Still,  giving  SIVS  access  to  the  icon  datastructures  does  not  fully  finesse  object  recognition. 
What  constitutes  an  object  for  the  Amazon  code  and  what  constitutes  an  object  for  Sonja  differ 
in  some  cases,  so  that  Sonja  has  to  do  significant  work  to  identify  objects.  Amazon  represents 
walls  in  terms  of  uniform  square  chunks  of  wall-stuff,  and  represents  only  the  positions  of  these 
chunks,  not  their  connectivity  relationships.  Accordingly,  as  described  in  section  7,  Sonja  must 
use  a  connectivity  operator  to  segment  obstacles  from  the  background. 

Ullman  originally  proposed  that  visual  routines  are  a  preprocessing  stage  for  shape  recog¬ 
nition  [60].  More  recent  work  suggests  that  the  two  are  parallel  channels  each  connecting  the 
early  maps  with  the  centra]  system.  It  is  tempting,  in  fact,  to  identify  shape  matching  and 
the  visual  routines  processor  with  the  temporal  and  parietal  visual  processing  streams,  devoted 
respectively  to  object  recognition  and  the  assessment  of  spatial  relationships,  described  by 
Ungerleider  and  Mishkin  [61]. 
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8.2  Evaluating  the  visual  operators 

This  section  evaluates  Sonja’s  VRP  according  to  the  three  “global*'  criteria  posed  in  section 

6.1. 

•  We  want  a  “spanning"  set:  that  is,  a  set  of  operators  that  together  are  sufficient  for  any 
task.  Here  “any  task"  may  mean  “any  psychologically  plausible  task"  or,  for  engineers, 
“any  task  in  the  class  of  domains  of  interest."  Thus  the  set  of  visual  operators,  when  com¬ 
bined  into  routines,  are  to  form  a  finite  means  for  the  realization  of  an  infinite  collection 
of  possible  visual  processes. 

It  is  hard  to  formulate  this  criterion  rigorously,  but  it  is  clear  that  SIVS  does  not  satisfy  it. 
I  added  operators  to  the  set  as  needed  for  particular  tasks.  By  the  end  of  the  implementation, 
I  found  I  often  had  all  the  operators  I  needed  to  implement  a  new  routine,  but  not  always. 
This  suggests  that  I  may  have  been  approaching  an  adequate  set,  but  I’m  sure  that  even  in  the 
one  domain  of  playing  Amazon  continued  system  development  would  occasionally  require  new 
operators. 

This  raises  the  concern  that  there  is  no  spanning  set,  or  that  it  would  too  large  to  implement 
with  the  amount  of  hardware  found  in  the  human  brain.  The  visual  routines  model  is  only 
plausible  if  a  relatively  small  spanning  set  is  possible;  since  operators  correspond  to  innate  bits 
of  hardware,  there  can  be  only  as  many  as  will  fit  in  the  brain.  (Given  that  we  know  little 
about  how  intermediate  vision  is  actually  implemented,  it  is  hard  to  say  how  many  this  would 
be.  Hundreds  might  be  feasible;  millions  probably  wouldn’t  be.)  Further  research,  for  example 
cross-domain  studies  or  formal  analysis  of  the  space  of  spatial  reasoning  tasks,  is  required  to 
address  this  issue. 

•  A  set  of  operators  should  not  only  make  it  possible  to  implement  any  visual  task,  it 
ought  to  make  it  easy.  From  an  engineering  standpoint,  the  VRP  should  present  a  nice 
programming  system. 

It  is  hard  to  evaluate  SIVS  on  thi6  score  because  I  never  had  an  opportunity  to  implement 
visual  routines  with  a  completed  and  debugged  VRP.  This  largely  negated  the  visual-system- 
as-a-programming-language  metaphor  I  was  trying  to  create.  Only  by  the  end  of  the  implemen¬ 
tation  was  the  VRP  relatively  complete  and  reliable;  by  that  time,  implementing  new  routines 
was  fairly  straightforward.  Further  experience  with  the  architecture  is  needed. 

My  feeling,  however,  is  that  the  set  of  operators  in  SIVS  is  on  the  whole  too  low-level. 
Writing  routines  often  seemed  to  require  more  work  than  it  felt  intuitively  like  it  ought  to.  It 
required  too  much  visual  “bit  diddling";  I  wanted  to  be  able  to  express  things  more  abstractly. 
Such  abstraction  might  be  provided  by  higher-level  operators.  It  might  also,  however,  be 
provided  by  a  “library"  of  general-purpose  parameterized  routines  which  would  use  low-level 
operators. 

•  A  set  of  operators  should  also  make  it  easy  to  learn  new  visual  routines. 
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Lacking  adequate  theories  of  learning,  it  is  hard  to  evaluate  SFVS  on  this  score.  My  thesis  [7] 
suggests  experiments  that  might  elucidate  the  problem.  It  seems  likely  that  learnability  would 
be  enhanced  by  the  other  criteria  I  have  discussed.  Learning  is  easier  when  your  primitives 
•pan  a  space  in  which  useful  combinations  are  relatively  dense  [27]. 

6.3  Successes 

This  paper  describes  an  implemented,  integrated  model  of  intermediate  vision  which  addresses 
several  fundamental  but  often  neglected  problems:  selective  application  of  visual  processing 
to  subsets  of  the  image,  search  for  regions  of  the  image  with  task-relevant  properties,  and 
the  computation  of  6patial  relationships  among  parts  of  the  imagt.  These  problems  arise  in 
most  real  visual  tasks.  SIVS'6  visual  attention  mechanism  models  psychophysical  evidence.  *ts 
model  of  visual  search  is  based  on  Treisman’s  theory  [54,  55]  and  is  the  first  implementation  of 
visual  search  that  models  psychophysical  results.  SIVS's  visual  routines  processor  substantially 
extends  Ullman’s  proposals  [60],  specifies  plausible  interfaces  between  visual  routines  and  earlier 
and  later  processing,  and  is  the  first  to  apply  visual  routines  to  a  natural  task.  All  of  these 
mechanisms  are  designed  to  be  implementable  in  biological  hardware.  Sonja  demonstrates  the 
value  of  the  mechanisms  by  using  them  to  support  visually-guided  activity. 

Because  the  biology  of  and  computational  constraints  on  intermediate  vision  are  still  poorly 
understood,  many  of  the  mechanisms  proposed  in  this  paper  represent  informed  guesses.  These 
proposals  pose  new  open  problems  which  can  be  the  basis  for  experimental  tests,  many  of  which 
I  have  suggested. 
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Chapter  1 

Motivation  and  Plan 


1.1  Outline 

We  would  like  to  produce  a  demonstration  of  robotics  capabilities  illustrating  approaches 
and  techniques  in  the  areas  of  real  time  perception,  reactive  control  and  planning,  and 
manipulation.  An  important  characteristic  of  our  approach  is  to  make  all  the  elements 
of  the  demo  as  general  as  possible,  so  that  we  can  extend  the  system  in  many  different 
directions  to  enable  additional  demonstration  capabilities. 

The  goal  of  this  plan  is  to  formalize  our  approach,  and  to  concentrate  our  efforts  so 
that  a  large  group  of  people  might  contribute  fully  to  a  significant  combined  effort . 

1.1.1  Philosophy 

Our  aim  is  to  create  a  set  of  generalized  methods  at  each  level  in  the  architecture  which 
we  hope  will  lead  to  a  cross-product  of  capabilities.  The  addition  of  a  new  capability  at 
any  level  multiplies  the  performance  of  the  whole  system  rather  than  simply  adding  to  it. 

We  intend  to  develop  a  system  which  is  clearly  programmed  in  a  generic  way,  in  which 
the  subcomponents  are  each  applicable  in  a  wide  range  of  situations,  so  that  the  whole 
system  can  respond  to  situations  far  outside  those  few  which  could  have  been  enumerated 
by  the  designers  of  the  system  and  of  each  component.  We  hope  to  demonstrate  incremental 
progress  down  a  well  defined  path  toward  such  richness  of  performance. 

1.1.2  Methodology 

We  will  tackle  this  project  in  a  series  of  stages  or  slices,  each  of  which  will  have  two  main 
phases: 

1.  Define  an  incremental  level  of  performance  which  can  be  achieved  in  a  period  of 
about  a  month.  Define  the  architecture,  interfaces  and  communications  languages  of 
the  complete  system.  Determine  the  additional  capabilities  which  will  be  required 
at  each  level. 

2.  Create  a  skeleton  system  in  which  each  module  exists  in  some  nominal  fashion. 
The  skeleton  system  should  allow  developments  in  each  separate  module  to  be  at 


1 


least  partially  tested  in  its  environment.  After  the  first  stage,  this  slice  will  involve 
something  like  implementing  the  extended  interface  definitions  and  recompiling. 

3.  Extend  the  capabilities  of  each  module  in  parallel,  to  included  those  requirements 
identified  at  the  start  of  each  stage. 

4.  Integrate  all  the  modules,  test  and  evaluate  the  system. 

1.1.3  Robot  games 

A  proposed  sequence  of  performance  goals  will  permit  the  system  to  piay  ‘robot  games' 
such  as  those  listed  below. 

1.  Toggle  (pick  up,  put  down;  hold  this,  hold  that). 

2.  Hide  and  Seek. 

3.  Passing  back  and  forth.  (Human  hand) 

4.  Tag? 

5.  Catch. 

6.  Pour  me  a  drink. 

7.  Simon  says  (user  specified  goals  in  some  language). 

It  should  be  emphasized  that  we  hope  to  build  a  system  whose  capabilities  cannot  be 
enumerated,  where  the  compositional  structure  of  the  goals  provides  great  richness.  The 
games  listed  above  are  simply  intended  to  motivate  and  illustrate  initial  subsets  of  the 
possibilities. 


1.2  Architecture 

We  conceive  of  the  system  as  having  three  components:  next  state  computation,  inter¬ 
nal  state,  and  action  control,  as  illustrated  in  Figure  1.1.  The  system  operates  in  a  regular, 
clocked  mode,  in  which  the  current  state  vector  is  updated  based  upon  new  inputs  and  the 
previous  state. 

The  three  components  can  be  divided  into  a  number  of  subcomponents.  We  identify 
three  major  components  of  the  internal  state: 

Recognition  state  Recognition  of  objects  and  characteristics  of  the  world  must  tala- 
place  in  stages  over  time,  with  successive  observations  accumulating  evidence  toward 
recognition.  This  accumulating  evidence  is  part  of  the  internal  state  of  the  system. 

Database  The  system  maintains  its  current  picture  of  the  world  in  the  form  of  an  objects 
and  properties  database. 
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Figure  1.1:  Proposed  demo  system  architecture. 


Goals  The  current  goals  (volition)  of  the  system  is  encoded  in  another  piece  of  the  internal 
state. 

Each  of  these  subdivisions  of  the  internal  state  vector  is  updated  based  upon  different 
mixtures  of  previous  state  and  new  inputs.  Thus  the  next  state  computation  may  also  be 
divided  into  the  same  three  sub  parts. 

Recognition  The  recognition  component  is  responsible  for  accumulating  evidence  about 
properties  of  the  world  in  the  recognition  state;  to  do  this,  it  considers  raw  input 
from  the  world,  the  properties  of  other  objects  in  the  world  (obtained  from  the 
database  state),  and  previous  evidence  collected  in  the  recognition  state.  In  addition, 
manipulator  status  is  recognized  and  used  to  update  properties  of  the  manipulator 
objects  in  the  database. 

Database  The  database  of  world  properties  is  maintained  by  degradation  of  the  informa¬ 
tion  over  time.  The  main  input  to  the  next  state  computation  for  the  database  is 
the  previous  state  of  the  database,  which  contains  all  the  data  about  velocities,  ac¬ 
celerations,  etc.  for  each  object.  In  addition,  the  recognition  subsystem  can  propose 
new  objects  for  the  database,  or  may  be  able  to  retract  previous  assertions  made  to 
the  database.  Thus  additional  input  is  obtained  from  the  recognition  state. 

Goals  The  goals  of  the  system  are  derived  from  speech  input  data,  and  are  maintained 
over  time  by  copying  the  previous  state. 

Based  upon  the  current  input  state,  the  system  can  take  various  actions  to  satisfy  its 
goals.  The  actions  can  .also  be  broken  down  into  several  parts: 

Sensor  control  Perception  of  the  world  is  an  active  process,  and  demands  that  the  sensors 
generating  the  raw  perceptu.al  data  are  pointed  and  controlled  to  acquire  the  data 
most  useful  for  recognition  of  conditions  in  the  world.  The  sensor  control  actions  aie 
dependent  upon  the  current  recognition  state,  and  also  upon  the  curient  objects  in 
the  world.  They  are  also  dependent  upon  the  current  goals  of  the  system. 
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Figure  1.2:  Proposed  demo  system  architecture. 


Arm  control  The  manipulators  arc  the  principal  means  by  which  the  system  interacts 
with  the  world  to  achieve  its  goals.  The  arm  control  outputs  depend  upon  the  objects 
currently  in  the  world  database,  and  also  the  goals  of  the  system. 

Speech  output  Speech  (or  other  textual  output)  is  important  for  answering  queries,  and 
for  prompting  or  asking  for  assistance  or  clarification  from  the  user  of  the  system. 
The  speech  output  is  based  upon  the  current  goals  and  the  state  of  the  objects  in 
the  world  database. 

While  the  model  of  the  system  presented  above  is  conceptually  simple  and  elegant,  it 
is  more  convenient  to  break  it  down  into  horizontal  slices  to  enable  different  pieces  to  be 
constructed  in  parallel  as  separate  modules.  Figure  1.2  shows  a  modular  decomposition  of 
the  architecture  described  above. 

The  relationship  between  the  parts  of  the  two  figures  is  described  below: 

Perceptual  modules  The  raw  input  data  requires  a  great  deal  of  specialized  processing 
before  it  is  suitable  as  updates  to  the  recognition  state;  The  perceptual  modules  of 
Figure  1.2  encapsulate  such  processing,  and  arc  viewed  as  a  set  of  tools  which  can 
be  pointed  and  controlled  in  order  to  ask  specific  questions  about  the  state  of  the 
world.  They  receive  inputs  from  the  sensor- control  action  module  in  addition  to  raw 
world  data,  and  generate  outputs  to  the  recognition  system. 

Recognition  The  recognition  module  of  Figure  1.2  groups  together  the  recognition  com¬ 
putations,  and  also  the  recognition  internal  state.  The  third  part  of  the  recognition 


4 


I 

i 

i 

I 

\ « 


slice  is  the  sensor  control  section. 

Database  The  database  component  includes  the  database  state,  and  the  update  rules. 
Inputs  are  obtained  from  the  recognition  (both  world  properties  and  textual  inputs). 

Database  projection  The  action  block  of  Figure  1.1  must  test  many  different  conditions 
about  the  world  represented  in  the  database.  In  order  to  ensure  that  the  develop¬ 
ment  of  each  of  the  parts  of  the  actions  block  is  well  structured,  the  computation 
of  predicates  representing  these  world  conditions  has  been  abstracted  into  a  block 
called  database  projection  (because  it  projects  the  facts  in  the  database  into  a  set 
of  conditions  and  properties  of  direct  interest  for  determining  actions  to  perform). 
A  central  task  is  to  determine  a  language  for  specifying  the  conditions  that  will 
be  implemented  in  database  projection,  that  will  motivate  the  type  of  recognition 
and  perception  capabilities  which  will  be  required,  and  that  will  control  the  type  of 
actions  which  can  be  implemented. 

Goal  strategies  and  action  control  The  actions  block  remains  grouped  together,  as 
the  different  actions  which  can  be  performed  must  be  mediated  by  some  common 
system  so  that  common  resources  (manipulators,  time,  attention)  can  be  allocated 
according  to  priorities  established  by  the  goals,  and  by  a  set  of  top  level  goal  achieve¬ 
ment  strategies.  The  goal  strategies  will  have  available  sets  of  action  strategies  fur¬ 
nished  as  speech  actions,  sensor  control  actions,  and  manipulation  actions. 

Arm  control  The  arm  control  block  translates  the  action  commands  from  the  actions 
block  into  low-level  manipulator  commands,  and  feeds  status  information  and  force 
data  back  into  the  recognition  module. 

Speech  control  The  speech  control  block  translates  the  speech  commands  from  the  ac¬ 
tions  block  into  low-level  speech  commands.  It  also  interprets  speech  input  and  feeds 
the  data  to  recognition. 

The  next  sections  discuss  the  requirements  for  each  of  these  modules  in  more  detail. 

1.2.1  Perception 

A  number  of  perception  modalities  are  anticipated,  including  vision,  touch/force,  and 
sound. 

The  visual  perception  capability  is  viewed  as  a  set  of  modules  making  reliable,  robust . 
real-time  measurements  of  physical  parameters  of  the  world.  They  are  to  be  viewed  as 
tools,  which  can  be  controlled  and  queried  by  the  higher  level  processes. 

Touch  and  force  feedback  is  afforded  by  the  robot  manipulators  available  and  will  be 
used  as  an  additional  capability  for  determining  details  of  the  world. 

Speech  input  and  output  (initially  simulated  using  a  keyboard  and  screen)  will  be  used 
for  goal  acquisition  and  for  the  system  to  interact  with  users  by  outputting  status  or 
requests  for  the  user. 
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1.2.2  Recognition 

The  recognition  component  will  take  as  input  the  results  of  queries  made  of  the  low-level 
perceptual  machinery  and  force-feedback  data  and  will  transduce  that  into  high-level  state¬ 
ments  about  objects  and  their  relations.  The  objects  that  we  would  like  to  recognize  (at 
least  for  the  initial  stages  of  the  project)  are:  table,  robot  arm  and  hand,  human  arm  and 
hand,  coffee  cup,  soda  can,  ping-pong  ball,  and  freezer  cont  er.  The  recognition  compo¬ 
nent  will  have  its  own  internal  state  and  will  be  responsible  .or  managing  the  uncertainty 
connected  with  intermediate  stages  of  recognition.  When  it  passes  information  out  to  the 
next  component,  it  will  be  taken  to  be  true. 

The  recognition  process  also  has  an  action  component.  It  will  be  important  to  direct  the 
low-level  perceptual  sensors  to  parts  of  the  world  that  will  be  of  most  use  to  the  recognition 
component.  In  a  similar  vein,  the  recognition  actions  will  be  triggered  by  high-level  goals, 
which  will  tell  the  robot  to  look  for  ping-pong  balls,  or  track  the  human  hand,  or  search 
for  a  flat  surface  on  which  to  place  some  object.  The  recognition-action  component  may 
also  direct  the  arm  in  order  to  use  the  force-feedback  to  find  the  location  of  an  object. 
Goals  of  information  will  have  to  be  prioritized  with  other  goals  in  order  to  avoid  conflicts. 

The  recognition  process  may  also  use  information  from  the  database  to  guide  its  per¬ 
ceptual  activities  and  inferences.  An  interface  must  be  carefully  specified  in  order  to  shield 
this  component  from  the  implementation  details  of  the  database. 

The  recognition  component  can  be  tested,  at  the  early  stages,  with  no  other  compo¬ 
nents,  by  giving  the  agent  top-level  goals  of  information  and  simply  tracing  the  outputs  of 
the  recognition  module. 

Individual  steps  in  building  the  recognition  component  are  hard  to  identify.  A  graded 
set  of  abilities  can  be  described,  however,  as  follows: 

1.  Static  object  alone  against  contrasting  background;  no  occluding  or  even  distracting 
objects. 

2.  Static  object  in  scene  with  others,  but  no  occlusion. 

3.  Slowly-moving  object  with  no  occlusion  or  confusion. 

4.  Slowly-moving  object  with  other  objects  but  no  occlusion. 

5.  Static  objects  that  occlude  one  another. 

6.  Moving  objects  that  occlude  one  another. 

1.2.3  Database 

The  database  component  will  be  responsible  for  storing  descriptions  of  objects  found  by 
the  recognition  component.  It  will  store  information  about  a  finite  number  of  objects  and 
their  relations  with  one  another  in  very  general  terms.  This  information  will  include  a  tag 
that  describes  the  type  of  the  object  and  bounds  on  the  locations  of  objects  with  respect 
to  one  another  as  well  as  their  relative  velocities.  The  information  will  come  from  the 
recognition  component. 


6 


This  component  can  probably  be  developed  a.id  tested  initially  by  using  synthetic  data. 
An  important  early  decision  is  whether  to  implement  it  in  C  or  in  Rex.  If  we  do  it  in  C 
it  will  require  a  fairly  large  amount  of  system  hacking  to  get  the  interfaces  to  work  right ; 
doing  it  in  Rex  entails  a  fairly  large  performance  penalty  due  to  the  inefficiency  of  indexing, 
but  would  make  it  easy  to  integrate  with  the  rest  of  the  system. 

The  database  must  perform  the  following  operations: 

Merging:  If  two  database  entries  can  be  shown  to  be  the  same  physical  object  (perhaps 
because  they  occupy  the  same  space),  then  their  parameters  should  be  intersected 
and  one  of  the  instances  deleted. 

Propagation:  If  it  is  possible  to  deduce  something  about  the  relation  between  objects 
i  and  k  from  the  relations  between  objects  i  and  j  and  objects  j  and  k,  then  the 
relation  between  objects  i  and  k  should  be  strengthened  accordingly.  It  is  difficult 
to  bound  the  number  of  propagation  steps  required  to  get  the  database  into  steady- 
state.  In  practice,  a  certain  number  of  propagations  will  be  performed  each  tick, 
leaving  some  relational  information  incompletely  localized. 

Degradation:  As  time  passes,  the  information  that  the  database  has  about  a  particular 
object  will  degrade.  The  database  must  degrade  its  information  every  tick,  based  on 
what  it  knows  about  individual  objects.  For  instance,  it  might  know  that  a  coffee 
cup  with  no  arms  near  it  will  stay  where  it  is  from  one  tick  to  the  next,  or  that  the 
robot  arm  can  only  move  at  a  certain  maximum  velocity  so  that  it  must  be  within  a 
certain  distance  of  where  it  was  last  tick. 

Inconsistency  reduction:  Although  the  recognition  component  strives  to  come  to  true 
conclusions,  sometimes  it  will  err.  The  database  may  notice  such  errors  in  either  the 
merging  or  the  propagation  phase.  When  it  tries  to  intersect  two  sets  of  properties 
and  relations  for  an  object  and  finds  a  contradiction,  it  must  remove  all  of  the  objects 
involved.  For  instance,  if  it  decides  that  two  entries  must  be  the  same  because  they 
occupy  the  same  space,  but  that  they  are  different  colors,  it  will  throw  them  both 
out. 

Pruning:  There  will  be  only  enough  room  in  the  database  for  a  finite  number  of  objects 
(in  order  to  keep  the  update  time  bounded).  When  a  new  object  is  found  by  the 
recognition  component  and  the  database  is  full,  it  must  decide  which  object  to  throw 
away.  This  may  be  based  on  the  recency  or  specificity  of  information  about  the  object . 
It  might  be  more  reasonable  to  make  it  depend  to  some  degree  on  the  agent’s  current 
goals;  this  would  require  feeding  goal  information  back  to  the  database  component. 

1.2.4  Manipulation 

The  manipulation  component  will  consist  of  a  set  of  low-level  manipulation  strategies, 
written  in  Gapps,  that  achieve  or  maintain  particular  goals.  These  strategies  are  expected 
to  make  use  of  conditions  that  can  be  derived  from  information  in  the  database  as  well 
as  (perhaps)  direct  use  of  force-feedback  data.  The  implementors  of  the  manipulation 
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component  should  tell  the  database  implementors  if  they  would  like  to  have  the  force 
information  directly  or  channeled  through  the  database. 

Many  of  these  abilities  can  be  debugged  initially  by  fixing  the  locations  of  the  objects 
and  using  force  information  directly  (unless  we  want  or  need  to  do  visual  servoing;  if  this 
is  the  case,  we  may  want  to  work  on  data  directly  from  the  recognition  component  or  from 
the  database).  We  may  want  to  invent  more  abilities  that  show  off  force  control  or  other 
novel  capabilities  of  the  robot  and  its  programmers. 

Below  is  a  list  of  possible  manipulation  abilities.  Although  they  are  listed  individually, 
it  would  be  best  if  the  entire  set  of  abilities  could  be  generated  from  a  few  major  kinds  of 
manipulations  and  different  sets  of  parameters. 

Desired  manipulation  abilities  include: 

1.  Pick  up  ping-pong  ball. 

2.  Pick  up  cup  (from  different  orientations). 

3.  Pick  up  soda  can. 

4.  Pick  up  freezer  container  (from  different  orientations). 

5.  Open  soda  can  (if  possible). 

6.  Pour  liquid  from  one  vessel  to  another  (soda  can  to  cup  is  a  good  starting  case). 

7.  Place  picked-up  objects  back  on  table. 

8.  Put  ping-pong  ball  inside  cup  or  freezer  container. 

9.  Put  freezer  container  or  cup  over  top  of  ping-pong  ball. 

10.  Put  object  in  human’s  hand. 

11.  Hold  objects  for  human  to  take,  noticing  when  he  has  a  hold  of  it  and  letting  go. 

12.  Take  objects  from  human. 

1.2.5  High-level  Action 

The  high-level  action  component  will  initially  consist  of  a  set  of  strategies  for  playing 
interactive  “games”  with  a  human  user.  A  more  sophisticated  version  will  have  the  user 
describe  (in  natural-ish  language)  what  game  he  wants  to  play. 

Some  possible  games  are: 

Pick-up/Put-down:  This  game  can  be  played  with  any  object;  the  robot  is  to  pick  up 
the  object  if  it  is  on  the  table  and  put  it  down  if  it  is  not.  Can  be  generalized  to  toggle 
between  any  two  user-specified  conditions.  The  conditions  can  be  finite  conjunctions 
(or  prios)  of  ach’s  and  maint’s.  Need  way  of  specifying  to  the  robot  that  it  should 
stop  the  current  game  and  start  another  one. 
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Catch:  In  the  simple  case,  give  an  object  to  the  human,  then  take  it  from  him.  In  the 
more  complex  case,  play  real  catch  by  throwing  the  object.  Not  sure  we  have  enough 
precision  in  when  the  gripper  opens  and  closes  to  do  this.  Could  catch  the  ball  in  a 
cup  and  “throw”  it  by  dropping  it. 

Hide  and  Seek:  Tell  the  robot  to  find  a  particular  object.  Robot  indicates  the  object 
(somehow)  when  it  is  found.  May  entail  physical  actions  like  picking  up  an  overturned 
coffee  cup  to  see  if  there  is  a  ping-pong  ball  under  it. 

Tag:  Dangerous.  Robot  tries  to  tag  human  hand,  then  human  tries  to  tag  robot  hand 
while  robot  tries  to  get  away. 

Simon  Says:  All  of  these  games  can  be  expressed  as  fairly  simple  Gapps  goals.  In  the 
long  run,  implement  run-time  goal-reduction  and  all  the  Gapps  goal  operators  at 
run  time  with  a  fixed  bound  on  total  goal  length.  With  a  suitable  set  of  primitive 
conditions,  there  could  be  a  wide  variety  of  interesting  robot  games.  This  ability 
can  be  approximated  in  the  shorter  term  by  having  a  set  of  high-level  goal  types  and 
allowing  the  user  to  give  the  agent  simple  conjunctions  of  parametrized  instances 
of  the  goal  types.  An  example  would  be:  (ach,  maint)  (on,  in)  <referring-expr> 
<referring-expr>.  Syntactic  sugar  could  be  used  to  make  the  language  more  natural. 

1.2.6  Database  projection 

In  the  course  of  writing  the  high-level  action  strategies,  the  programmer  will  need  to  test 
a  number  of  conditions.  These  conditions  will  not  be  directly  available  in  the  database, 
but  should  be  projectable  from  the  information  contained  in  the  database.  *n  addition, 
there  will  be  a  set  of  standard  world  conditions  that  the  human  can  use  to  communicate 
with  the  robot.  The  task  of  high-level  perception  is  to  implement  the  functions  that  map 
the  database  into  these  conditions.  They  will  be  indexical-functional  conditions  like  the- 
location-of-the-cup-containing-the-ping-pong-ball  and  is-there-a-unique-cup-containing-a- 
ping-pong-ball. 

It  might  be  reasonable  to  write  these  functions  in  Ruler,  although,  because  these  are 
pure  functions,  we  wouldn’t  need  the  state-update  facilities.  The  standard  world  conditions 
can  be  specified  ahead  of  time,  but  much  of  the  specification  of  this  task  will  depend  on 
the  specific  conditions  that  need  to  be  tested  in  order  to  carry  out  particular  strategies. 
We  will  use  a  simple  compositional  language  to  specify  these  conditions  rather  than  using 
atomic  names. 


1.3  Interfaces  and  Implementations 


1.3.1  Machines  and  languages 

Perceptual  modules:  Initially,  stereo  and  motion  modules  will  be  implemented  on 
Natasha  (symbolics  lisp  machine),  and  shape  and  other  capabilities  will  be  imple¬ 
mented  on  Boris  (Decstation  3100).  Since  there  is  no  frame  buffer  capability  on  Boris. 
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N'ausha 


Boris 


Wayback 


Figure  1.3:  Processes  and  machines. 


Wayback  (Sun  2/120)  will  be  hooked  up  as  a  frame  buffer  server  passing  portions  of 
image  to  Boris  across  the  ethernet. 

Recognition:  Recognition  will  be  implemented  in  Hex,  to  run  as  part  of  a  monolithic 
process  (incorporating  the  database,  database  projection,  and  action  rules)  on  Boris. 

Database:  The  database  will  probably  be  implemented  in  C,  but  will  be  linked  into  the 
Rex  module  running  on  Boris.  Nathan  has  described  a  technique  for  generating 
a  shell  database  by  defining  inputs  and  outputs  and  a  minimal  Rex  program,  and 
causing  the  Rex  compiler  to  generate  the  shell  which  will  then  be  filled  in. 

Database  projection:  The  database  projection  rules  will  be  written  in  Rex  and  incor¬ 
porated  into  the  main  process. 

Actions:  Each  of  the  actions  modules  will  be  written  as  a  set  of  Gapps  rules,  which  will 
then  be  integrated  by  the  Gapps  compiler  into  the  main  process  running  on  Boris. 

1.3.2  Synchronization  issues 

We  have  decided  to  maintain  the  asynchronous  behaviour  between  the  different  processes 
of  the  system,  as  it  offers  significant  simplifications  in  communications  protocols,  and  offers 
possibilities  for  running  the  system  in  a  fault  tolerant  manner. 

However,  where  multiple  processes  an  running  on  the  same  machine  (Boris),  there  was 
a  question  as  to  whether  the  processes  should  communicate  directly  to  avoid  pathological 
timesharing  behaviour,  or  whether  indirect,  synchronization  or  no  synchronization  would 
suffice. 

No  synchronization:  The  round  robin  type  timesharing  operating  system  would  ensure 
that  each  process  obtained  its  share  of  the.  resomces  over  time,  but  it  was  questioned 
whether  the  granularity  of  such  sharing  would  be  satisfactory  for  maintaining  real¬ 
time  performance. 
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Indirect  synchronization:  One  possibility  would  be  simply  to  have  the  cooperating  pro¬ 
cesses  simply  relinquish  control  of  the  CPU  at  suitable  intervals  by  sleeping  for  a  few 
milliseconds,  enabling  the  other  process  to  resume  action.  Suitable  points  would  be 
at  the  beginning  or  end  of  each  Rex  tick,  and  between  working  on  different  perception 
queries. 

Direct  synchronization:  Apart  from  hacking  the  scheduler  (or  obtaining  a  real  time 
operating  system  with  the  necessary  facilities  already  present),  other  more  plausible 
solutions  include  synchronizing  the  two  processes  by  means  of  pipes,  signals,  or  other 
IPC  mechanisms. 

Desirable  properties  of  the  mechanism  chosen  include  satisfactory  performance,  and  main¬ 
tenance  of  modularity  between  the  different  processes.  It  would  be  good  if  the  processes 
could  be  moved  to  alternative  processors  without  changing  the  code  too  much. 

The  hope  is  that  with  Rex  sleeping  after  each  tick  up  to  the  declared  tick  time,  the 
scheduler  will  be  able  to  give  the  perception  processes  sufficient  compute  time  for  the 
system  to  work.  If  this  turns  out  not  to  work,  we  can  either  switch  back  to  having  Rex  call 
the  per^ption  processes  explicitly  or  have  some  synchronizing  message  passing  between 
the  con.;  .-Ring  processes. 

1.3.3  Interfaces 

Most  of  the  interfaces  between  the  modules  of  Figure  1.2  will  be  internal  data  structures 
within  Rex.  The  definition  phase  will  determine  the  nature  of  the  data  exchanged  across 
such  interfaces. 

A  number  of  external  interfaces  between  processes  and  machines  also  need  to  be  de¬ 
signed. 

1.3.3. 1  Recognition  and  Perception  (interfaces  A,  B,  D) 

This  section  explains  how  we  expect  code  written  in  Rex  to  control  and  get  information  from 
a  set  of  perceptual  processes  each  capable  of  performing  some  parametrized  calculation  on 
some  input  unavailable  to  Rex.  It  is  designed  to  handle  situations  in  which  many  questions 
can  be  answered  in  one  tick,  as  well  as  situations  in  which  it  takes  many  ticks  to  answer  a 
single  question. 

The  general  model  is  that  each  tick  Rex  will  ask  up  to  a  fixed  number  of  questions  in  a 
prioritized  order.  Each  tick  it  will  also  receive  up  to  a  fixed  number  of  answers  to  questions 
that  it  asked  sometime  in  the  past. 

The  questions  are  sent  to  the  Rex  Execution  Environment,  hereafter  RexEx,  in  the 
usual  way.  (Note:  In  the  current  implementation  the  Rex  code  is  actually  called  directly 
by  RexEx  and  hence  is  just  a  part  of  the  RexEx  process.) 

RexEx  takes  each  question  and  tags  it  with  the  tick  number  in  which  the  question  was 
asked.  It  then  sends  the  questions,  in  priority  order,  to  whichever  process  it  considers 
appropriate,  as  a  packet  through  a  socket.  Using  sockets  allows  us  to  have  either  processes 
running  on  separate  machines  connected  to  the  machine  running  RexEx,  or  as  separate 
processes  running  on  the  same  machine.  Note  that  it  is  also  possible  to  have  a  single  process 
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that  answers  the  questions  for  more  them  one  socket.  Initially  the  different  processes  will 
answer  very  different  types  of  questions  so  they  will  probably  be  dispatched  based  simply 
on  a  type  field  in  the  question  from  Rex.  In  the  future,  if  we  ever  have  a  system  where  more 
than  one  perceptual  process  is  capable  of  answering  the  same  question,  a  more  complex 
dispatching  system  could  be  devised.  After  dispatching  all  the  questions  asked  by  Rex. 
RexEx  sends  a  null  question  packet  to  all  the  perception  processes  that  wei  e  not  asked  any 
question.  As  with  the  other  question  packets,  the  null  question  packets  include  the  current 
tick  number. 

Each  perception  process  maintains  a  queue  of  questions.  When  a  perception  process 
has  an  empty  queue  it  blocks,  waiting  for  new  questions  to  arrive  on  its  socket  from  RexEx. 
When  a  question  is  read  by  a  perception  process,  any  questions  in  its  queue  from  earlier 
ticks  are  pitched  and  the  new  question  is  put  on  its  queue.  Once  the  socket  is  empty  the 
process  takes  off  the  first  question  in  the  queue  and  starts  working  on  it.  When  it  is  done 
it  sends  the  answer  back  to  RexEx,  looks  for  new  questions  and  starts  over.  Note  that  since 
null  packets  are  labelled  with  their  tick  numbers,  they  cause  the  queue  to  be  purged. 

After  sending  out  the  questions,  RexEx  looks  for  any  answers  that  might  have  arrived 
from  the  perception  processes.  If  more  answers  arrive  on  a  given  tick  than  can  be  sent  to 
Rex,  the  oldest  answers  are  discarded.  A  more  complex  selection  scheme  may  need  to  be 
developed  if  it  turns  out  that  we  are  discarding  a  lot  of  answers. 

We  considered  a  number  of  more  complex  schemes  for  handling  overruns  of  questions 
going  to  the  perception  processes  or  answers  coming  back  to  Rex.  The  systems  we  con¬ 
sidered  involved  giving  each  question  a  unique  ID  number  and/or  a  priority  number  over 
some  small  range. 

1.3.3.2  Framebuffer  server  to  Perception  (interface  C) 

This  will  be  a  two  way  ethernet  protocol,  in  which  perception  indicates  the  subimage 
desired  from  which  frame  buffer  and  at  what  resolution.  The  existing  UDP  packet  protocol 
could  be  used,  subject  to  packet  size  constraints.  Alternatively  some  higher  level  protocol 
will  be  used.  This  will  be  decided  in  the  next  phase. 
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Chapter  2 

Level  1  Demonstration 


2.1  Task  Definition 

This  section  gives  an  external  specification  of  the  desired  abilities  of  the  level  1  demon¬ 
stration.  Note  that  it  is  not  sufficient  to  simply  achieve  these  abilities;  the  code  should 
be  structured  according  to  the  general  philosophy  and  methodology  of  this  project  and  be 
amenable  to  extension  into  the  next  phases  of  the  demonstration. 

The  system  should  be  able  to  satisfy  goals  of  the  form 

{ach  |  maint } ((verb) (noun) (noun)) 

in  which  (verb)  ranges  over  the  set  {touching,  grasping}  and  [noun)  ranges  over  the  set 
{hand,  table,  cup,  ping-pong  ball}.  The  things  in  the  verb  category  are  binary  predi¬ 
cates;  the  nouns,  unary  predicates  (note,  not  individuals).  The  semantics  are  existential; 
that  is,  ach  (grasping  hand  cup)  means  that  some  hand  should  be  grasping  some  cup.  The 
semantics  will  be  made  more  formal  in  the  section  on  C\. 

This  definition  should  help  the  designer  of  each  module  to  determine  what  its  inputs 
and  outputs  must  be  in  order  to  enable  the  desired  top-level  competences. 

The  particular  top-level  goal  of  the  agent  will,  initially,  be  specified  at  compile  time, 
but  it  should  be  a  simple  extension  to  allow  top-level  goals  of  this  kind  to  be  entered  by  a 
user  at  the  keyboard. 


2.2  Language  and  Database  Definitions 

2.2.1  Ci 

This  section  contains  a  preliminary  specification  for  the  language  to  be  used  in  specifying 
goals  and  internal  conditions  for  the  Level  1  demonstration. 

2.2.1. 1  Abstract  C\i 

First  order  language  over  the  following  symbols: 
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Constants: 

Unary  predicates: 
Binary  predicates: 
Unary  functions: 
Binary  functions: 


tablel,  robot-handl,  cameral 

table,  robot-hand,  camera,  cup,  ping-pong  ball 

grasping,  touching,  in 

weight 

relative  pose 


Obviously,  auxiliary  predicates  will  be  required  for  expressing  any  interesting  conditions 
and  manipulation  strategies.  E.g.,  how  do  we  express  the  fact  that  the  cup  is  upside  down? 


cup(c)  A  upside  —  down(c) 


or 

surface(b )  A  table(t)  A  cup(c)  A  bottom  -  of(b,c)  A  touching(b,t) 

2.2.1.2  Concrete  C\\  CL\ 

To  serve  as  a  concrete  language  for  programming,  we  have  to  define  the  Gapps  expres¬ 
sions  that  express  conditions  abstractly  defined  by  AC\.  Gapps  has  two  places  where 
“conditions”  or  propositions  play  a  role: 

1.  as  the  first  operand  in  an  if  expression:  (if  condition  subgoall  subgoal2)\  and 

2.  as  the  operand  of  an  ach  or  a  maint  expression,  with  the  added  complexity  that  the 
condition  is  factored  into  a  compile-time  tag  and  run-time  parameters:  (ach/maint 
tag-of- condition  parameters -of- condition). 

2. 2. 1.2.1  Testable  Conditions  The  first  case  is  easier  to  deal  with:  we  can  use  ex¬ 
actly  the  first  order  language  AC\ ,  with  quantifiers  interpreted  sis  finite  quantification  over 
indices  in  the  data  base  (rather  than  real-world  objects — a  major  pun). 

So,  we  can  implement  a  Rex  function,  db-test  which  takes  a  Lisp  s-expression  that 
encodes  a  statement  in  AC\  and  returns  true,  false,  or  dont-know.  It  is  possible  to  refer  to 
individuals  in  these  expressions  by  using  a  database  index,  gotten  by  calling  the  function 
a  with  a  variable  name  and  an  expression  in  AC\  in  which  that  variable  occurs  free.  (For 
example  (a  *x  (cup  x)).)  It  will  be  useful  to  augment  AC\  with  the  quantifier  3!;  this 
obviates  the  need  for  a  the  function,  as  well. 

2. 2. 1.2. 2  Goal  Conditions  The  second  case  is  trickier,  since  the  goal  reduction  rules 
don’t  just  “evaluate”  the  condition,  they  trigger  off  its  syntactic  shape.  For  now,  define  a 
subset  of  goals 

( ach/maint  ((binary)  (unary)  (unary))[u\  vl  i>2]) 

where  the  condition  encoded  by  the  ordered  pair  ((r  p\  &),  [  ])  is  3r,  y.  pi(x)Ap2(y)Ar(x.  ij). 
Far  from  general,  but  a  way  of  getting  started.  (This  is  the  interpretation  used  in  the 
section  describing  the  external  goals  for  this  level.) 
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2.2.2  Database  Definition 


Let  LP  and  LR  be  lattices  with  elements  finitely  representable  and  with  operations  n 
(meet),  U  (join),  (=  (entails).  These  lattices  can  be  primitive  lattices  of  several  types 

•  finite  lattices  defined  explicitly  (e.g.,  type  hierarchies) 

•  interval  lattices  with  elements  (a:,  y],  where 


[*i>yi]n  [x2,y2]  =  [max(x1,x2),min(yj,y2)]. 


•  product  lattices  with  elements  [xj,. . . ,x„],  where 

[xj , . . .. ,  xn]  n  [y, , . . . ,  yn]  =  [xi  ny1,...,xnnyn). 

Intuitively,  the  elements  of  LP  represent  primitive  unary  properties  that  can  hold  of 
objects  in  the  robot’s  world,  and  the  elements  of  LR  represent  primitive  binary  relations. 
(The  non-primitive  ones  will  be  computed  from  the  database  by  database  projection.) 

In  addition  to  meet,  join,  etc.,  we  assume  we  are  also  given  the  following  operations  on 
lattice  elements: 

degl  :  LP  ->  LP 

deg2  :  LR  — ►  LR 

triang  :  LR  x  LR  LR 

Let  0  (1)  be  the  minimal  (maximal)  elements  of  some  lattice  determined  by  context. 
Let  DB  =  [P,R],  where  P  is  a  vector  of  size  n  with  elements  drawn  from  LP,  and  R  is  an 
n  x  n  array  with  elements  drawn  from  LR.  Let  IN  be  the  input  variable,  taking  as  values 
triples  (u,v,u>)  where  u  €  LP  and  v,  w  are  each  n-tuples  of  elements  of  LR.  (Intuitively. 
u  represents  a  unary  description  of  an  object;  v  and  w  represent  the  relations  between  it 
and  the  other  objects.  Most  of  v  and  w  can  be  0,  possibly  excepting  the  objects  relation 
to  the  camera  and/or  the  arm.  Obviously,  we  could  have  also  have  a  vector  of  INs.) 

We  can  define  the  DB  component’s  next-state  function  as 

f(DB,IN )  =  infer(insert(/Ar,purge(degrade(L>B)))) 

with  the  following  function  definitions. 

Degrade: 


degrade([P,  Rj)  =  [P\R'), 


for  all 


P’\i  I  =  degl(F[i))) 


and  for  all  i,j, 


=  deg2(*M). 
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Purge: 


purge([P,Bl)  =  [P',J^ 

Let  m  be  the  index  of  the  least  important  object  in  DB;  then 

p/m  _  /  0  if  t  =  m 
*  i  \  P[i]  otherwise 

{0  if  i  —  m 

0  if  j-m 

otherwise 


Insert: 


insert((u,u,tt>),[P,  Pj)  =  [P',  J?'] 


Let  m  be  the  index  of  the  least  important  object  in  DB  (the  index  just  purged) 


,,[i)  -  { k 


if  i  =  m 
j  otherwise 


Infer: 


{v[j]  if  :  =  m 

u>[t]  if  j  =  m 

P[:,j]  otherwise 


infer ([P,  R])  =  inferl(inferl(. . .  inferl([P,  R]) . . .)) 


infer  l([P,i?])  =  propagate(merge([P,  P])) 


Merge: 


merge([P ,  R])  —  [P',  R'] 

Let  i,j  be  the  first  mergeable  pair  (i.e.,  the  first  pair  such  that  R[i,j]  j=  '  For  all  k, 

m  =  P[i]nP[j] 

P'[i,  k]  =  R[i,k)nR[j,k) 

R[k,i)  =  P[M]nP(M 


Propagate: 


For  all  k, 


propagate([P,P])  =  [P,  R'] 


=  P[i,;]  n  triang(P[?,  k],R[k,j\) 


2.3  Interface  Definitions 

2.3.1  Perception  to  Recognition:  Vision  tools  on  Natasha 

2.3.1. 1  General  interface  notes 

2.3.1. 1.1  Port  numbers 

Natasha  system  control  port:  1024  +  442  =  1466 
Natasha  receive  port:  1024  +  444  =  1468 
Natasha  reply  port:  1024  +  445  =  1469 

2.3.1. 1.2  Datagram  formats  To  start  the  system,  send  a  one  word  datagram  to 
Natasha’s  system  control  port  containing  the  number  1.  This  starts  a  server  which  will 
then  handle  requests  and  respond  on  the  receive  and  reply  ports.  To  stop  the  system,  send 
a  one  word  datagram  to  the  system  control  port  containing  the  number  0. 

Request  datagram  packets  begin  with  a  tick  number,  followed  by  a  single  code  word 
indicating  type  of  request,  followed  by  additional  words  specific  to  the  type  of  request 
being  made. 

Reply  datagrams  will  begin  with  the  code  word  and  parameter  words  from  the  request 
being  replied  to,  followed  by  additional  words  specific  to  the  type  of  request. 

2.3.1. 1.3  Scheduler  behavior  Once  started  by  the  start  system  message,  the  sign- 
correlation  scheduler  will  monitor  incoming  datagrams  on  Natasha’s  receive  port.  Packets 
will  be  read  out  of  the  input  buffer  until  the  last  one  is  found.  All  packets  with  a  tick 
number  smaller  than  that  of  the  last  packet  (unless  packets  can  come  out  of  order)  will 
be  discarded.  The  most  recent  packet  will  be  decoded  and  dispatched  to  the  appropriate 
measurement  tool.  The  resulting  measurement  will  then  be  returned  to  the  requester  and 
this  cycle  will  be  repeated. 

Note  that  the  input  buffer  seems  to  have  a  capacity  for  about  40  short  datagrams. 
When  it  is  full  it  ignores  new  stuff.  Thus  if  too  many  new  requests  are  sent  while  Natasha 
is  working  on  an  earlier  one,  the  41st  and  later  requests  will  be  discarded. 

2. 3.1. 2  Motion  Tool 

The  motion  tool  will  measure  motion  over  its  full  field  of  view  and  report  the  position, 
velocity,  and  approximate  size  of  the  n  fastest  moving  regions,  (n  will  be  a  small  number 
in  the  range  (1  to  4)). 

1.  Motion  request  datagram  format 

Tick  number  The  number  of  the  Rex  tick  on  which  this  message  was  generated. 
Request  code  =  100.  Code  for  a  motion  measurement. 

2.  Motion  reply  datagram  format 
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Tick  number  The  number  of  the  Rex  tick  on  which  this  message  was  generated. 
Request  code  =  100.  Code  for  a  motion  measurement, 
motion  vector  5  word  motion  cluster. 

(a)  x  position  in  mm  relative  to  calibration  plane  origin 

(b)  y  position 

(c)  x  velocity  in  mm  per  second  relative  to  calibration  plane 

(d)  y  velocity 

(e)  cluster  area  in  square  mm  at  calibration  plane 

2.3.1.3  Stereo  Tool 

The  stereo  tool  will  make  an  m  by  m  set  of  range  measurements  about  a  specified  location 
in  the  visual  field — m  on  the  order  of  4  or  5.  It  is  capable  of  taking  advice  on  the  expected 
range  to  the  surface  at  that  location.  The  tool  will  returns  a  confidence  measure  for  the 
measurement,  the  average  range  to  the  surface,  the  gradient  of  the  surface,  its  x  and  y 
curvature,  and  the  disparity  range  searched. 

1.  Stereo  request  datagram  format 

Tick  number  The  number  of  the  Rex  tick  on  which  this  request  was  generated. 
Request  code  =  200.  Code  for  a  stereo  measurement. 

x  position  center  position  of  measurement  in  mm  relative  to  calibration  surface 
origin. 

y  position  center  position  of  measurement  in  mm  relative  to  calibration  surface 
origin. 

z  position  estimate  in  mm  relative  to  calibration  surface;  -32768  means  no  estimate 
available. 

measurement  patch  diameter  in  mm  relative  to  calibration  surface. 

2.  Stereo  reply  datagram  format 

Tick  number  The  number  of  the  Rex  tick  on  which  this  request  was  generated. 
Request  code  =  200.  Code  for  a  stereo  measurement. 

x  position  center  position  of  measurement  in  mm  relative  to  calibration  surface 
origin. 

'  y  position  center  position  of  measurement  in  mm  relathe  to  calibiation  surface 
origin. 

z  position  estimate  in  mm  relative  to  calibration  surface;  -32768  means  no  estimate 
available. 

measurement  patch  diameter  in  mm  relative  to  calibration  surface. 
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failure  code  =  if  not  zero  then  there  was  a  problem  indicated  by  following  code 
numbers. 

1  requested  measurement  lies  outside  of  active  camera  field  of  view. 

2  failed  to  measure  a  good  correlation  over  the  range  searched 

confidence  0  to  100  scale  with  100  the  highest  confidence  in  following  measurement 
data. 

average  range  Height  above  (negative  -  below)  calibration  plane  in  mm.  Calibra¬ 
tion  plane  position  established  at  system  calibration  time. 

x  gradient  of  surface  In  mm  per  mm  on  calibration  plane. 

y  gradient  of  surface  In  mm  per  mm  on  calibration  plane. 

x  curvature  of  surface  In  mm  per  mm  per  at  calibration  plane. 

y  curvature  of  surface  In  mm  per  mm  per  at  calibration  plane. 

condition  codes  set  bits  in  this  word  indicate  following  conditions  (bit  zero  is  least 
significant). 

0  dropouts  seen  at  positive  y  side  of  region 

1  dropouts  seen  at  negative  y  side  of  region 

2  dropouts  seen  at  positive  x  side  of  region 

3  dropouts  seen  at  negative  x  side  of  region 

2.3.2  Perception  to  Recognition:  Vision  tools  on  Boris  (or  Way- 
back) 

2.3.2. 1  General  interface  notes 

2.3.2. 1.1  Port  Numbers 

Boris  Shape  tool  requests  sent  to  port:  boris:(1024  +  450) 

Boris  Shape  tool  replies  returned  to  port:  boris:(1024  +  451) 

Wayback  shape  tool  requests  sent  to  port:  wayback:(1024  +  272) 

Wayback  shape  tool  replies  returned  to  port:  boris:(1024  -f  273) 

2.3.2. 1.2  Scheduling  The  shape  tool  will  await  messages  from  its  assigned  port  in 
a  sleeping  state,  permitting  other  processes  on  the  machine  to  run.  When  one  or  more 
packets  are  received,  all  available  packets  are  read,  the  latest  (largest)  tick  number  is  noted, 
and  all  packets  having  a  tick  number  less  than  this  are  discarded.  The  shape  tool  will  then 
work  on  the  remaining  requests  in  the  reverse  of  the  order  in  which  they  were  received. 

The  shape  tool  will  sleep  for  1ms  at  intervals  to  be  agreed  upon,  and  between  processing 
packets,  in  order  to  let  other  processes  running  on  the  machine  obtain  a  time  slice  more 
frequently  than  the  scheduler  might  otherwise  enforce. 
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2.3.2. 1.3  Datagram  formats  Datagrams  to  the  shape  tool  will  take  the  form  of  a 
request  code,  and  a  tick  number,  followed  by  a  series  of  parameters  according  to  the 
following  definitions. 


typedef  Coord 

int; 

/*  Pixels  */ 

typedef  Coord2 

int; 

/*  Square  pixels  */ 

typedef  Angle 

int; 

t*  RAUs  (4096  per  circle)  */ 

typdef  enum 

{ 

•hape.status  *  0, 
shape.axes  *  1, 
thape.parameters  ■  2, 
shape .signature  »  3, 

/*  Future  extensions.  */ 
}  ShapeToolRequestCode ; 


/*  Trigger  a  null  response.  */ 

/*  Ask  for  orientation  of  major/minor  axes  */ 
/*  Ask  for  centre  of  blob  */ 

/*  Ask  for  signature  of  blob  */ 


typedef  enum 

{ 

mode.normal  *  0,  /*  Default  segmentation  mode  */ 
>  SegmentationMode; 


struct  ShapeToolRequestPacket 

{ 

int 

tick; 

ShapeToolRequestCode 

code; 

SegmentationMode 

mode; 

Coord 

x; 

Coord 

y; 

Coord 

w; 

Coord 

>; 

h; 

typedef  enum 

{ 

shape. status.null  «  0, 

/*  Empty 

shape.status.blank  *  1, 

/*  Found 

shape. status.ok  «  2, 

/*  Found 

shape.status.truncated  *  3, 

/*  Shape 

>  ShapeToolStatusCode ; 

struct  ShapeToolResponsePacket 

int 

tick; 

ShapeToolRequestCode 

code; 

SegmentationMode 

mode; 

Coord 

x; 

Coord 

y; 

/*  Monotonic  tick  number.  */ 

/*  Request  code.  */ 

/*  Unused.  */ 

/*  Coords  of  region  to  look  near.  */ 
/*  Size  of  region  to  look  in.  *7 


response  to  any  query  */ 
no  shape  to  analyse  */ 
shape  to  analyse  */ 

may  extend  beyond  portion  analysed.  */ 


/*  Tick  number  from  request.  */ 

/*  Request  code.  */ 

/*  Unused.  */ 

/*  Coords  of  region  to  look  near.  */ 
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Coord 

w; 

/*  Siza  of  ragion  to  look  in.  */ 

Coord 

h; 

ShapeToolStatusCode 

status ; 

unsigned 

trunc; 

/*  Unusad  */ 

union 

{ 

/*  Response  to  shape.parameters  */ 
struct 


{ 


Coord 

cant.x;  /*  Cantra  of  araa  */ 

Coord 

cant.y; 

Coord2 

araa;  /*  Araa  */ 

Coord 

perim;  /*  parimatar  langth  */ 

Coord 

major;  /*  Langth  of  major  axis  */ 

Coord 

minor;  /*  Langth  of  minor  axis  */ 

Angla 

oriant;  /*  of  major  axis  ccw  from  y«0  */ 

} 

shape.parameters ; 

/*  Rasponse  to 

shapa.signatura  */ 

struct 

i 

Coord 

x;  /*  Position  of  peak  +/ 

Coord 

y; 

int 

strength;/*  Height  of  peak  */ 

>  signature [SignatureCount] ;  /*  Report  up  to  SignatureCount  peaks  */ 

>  response; 

>; 

2.3.2. 2  Shape  tool 

The  shape  tool  will  analyse  a  rectangular  region  of  the  image  centered  at  x,  y,  with  size 

2*w  x  2*h.  The  region  will  be  segmented  according  to  the  Segmentation  mode  field: 

currently  only  a  default  mode  is  defined. 

e  A  shape_status  request  will  cause  a  packet  with  status.null  to  be  returned, 
e  If  no  region  is  found,  the  status  parameter  will  be  set  to  status.blank. 

•  If  there  is  a  region,  the  status  parameter  will  be  set  to  status_ok.  If  there  is  more 
than  one  region  in  the  field,  the  report  will  describe  one  of  the  regions.  This  might 
be  the  largest  region  entirely  enclosed  within  the  field. 

•  If  the  region  reported  upon  has  pixels  in  the  edgemost  row  or  column  of  the  field, 
it  cannot  be  distinguished  that  the  region  does  not  extend  beyond  the  field,  and  the 
status  parameter  will  be  set  to  status.truncated.  A  future  extension  might  use 
the  trunc  field  to  report  which  edges  or  comers  of  the  field  cut  across  the  region. 
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2.3. 2.2.1  Request  shape  parameters  This  request  will  ask  for  the  centre  position, 
area,  perimeter,  and  orientation  and  length  of  major  and  minor  axes  of  the  blob  in  the 
region. 


2.3.2. 2.2  Request  shape  signature  This  request  will  ask  for  the  raw  signature  of 
the  region  shape:  this  will  be  the  coordinates  of  the  SignatureCount  highest  peaks  on  the 
region,  along  with  their  heights.  The  coordinates  might  be  object  centred,  and  the  heights 
might  be  normalized  so  that  the  largest  has  a  specified  value. 

2.3.3  Recognition  to  Database 

The  interface  between  the  database  and  recognition  components  will  consist  of  an  array 
of  records.  Each  record  represents  information  about  an  object,  including  a  set  of  unary 
properties,  such  as  type,  and  the  relation  of  the  object  to  each  of  the  landmark  objects. 
The  size  of  the  array  will  be  2  initially,  but  may  grow  as  recognition  gets  to  be  more  useful. 

Landmark  objects  are  things  that  are  unique  and  known  to  all  of  the  components.  We 
propose  the  following  set  of  landmark  objects:  gripper,  camera,  and  table.  The  table's 
frame  will  be  the  same  as  the  frame  of  the  robot,  which  is  centered  at  the  base.  This 
is  necessary  because  the  dead-reckoning  capabilities  of  the  arm  continuously  report  the 
location  of  the  gripper  in  the  coordinates  of  the  arm-base.  Thus,  if  we  know  the  location 
of,  say,  a  ping-pong  ball  with  respect  to  the  gripper  and  the  location  of  the  gripper  with 
respect  to  the  base,  we  can  deduce  the  location  of  the  ping-pong  ball  with  respect  to  the 
arm  base.  In  the  next  tick,  imagine  that  the  gripper  moves  with  respect  to  the  base,  but 
the  ping-pong  ball  does  not.  In  this  case,  we  will  have  new  information  about  the  relative 
position  of  the  gripper  and  the  base;  when  we  combine  that  information  with  the  relative 
position  of  the  ping-pong  ball  and  base,  we  will  have  current  information  about  the  relative 
position  of  the  ping-pong  ball  and  the  gripper. 

The  gripper  also  has  a  standard  frame  of  reference  associated  with  it.  The  camera 
frame  will  be  centered  between  the  camera  centers  and  pointing  down  the  middle  (the 
average  of  the  directions  of  the  cameras). 

In  order  to  make  the  data  structures  uniform,  we  will  define  three  types:  property, 
relation,  and  recognition-data.  These  types  may  be  the  same  as  the  ones  that  are 
used  internally  in  the  database  (that  would  simplify  things),  but  need  not  necessarily  be 
the  same. 

The  property  data  type  will  have  the  following  fields: 

type  The  type  of  the  object  will  be  one  of  primitive  types  or  a  disjunction  of  a  set  of 
them.  The  primitive  types  will  be:  gripper,  camera,  table,  pp-ball,  and  cup. 

gripper-open  Is  the  object  open?  Can  only  be  true  of  the  gripper.  Needs  no  valid  bit 
because  we  always  know  whether  or  not  it  is  open.  It  is  sort  of  disgusting  to  have 
to  have  this  in  every  entry,  but  Rex  requires  fixed  data  types.  It’s  possible  that  we 
could  use  this  field  to  mean  different  things  for  different  types,  if  we  ever  need  to. 

The  relation  data  type  will  have  the  following  fields: 
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relative-pose  Bounds  on  the  position  and  orientation  of  the  first  object  in  the  second 
object’s  frame  of  reference.  Jeff  is  working  on  exactly  how  these  bounds  are  to  be 
represented. 

relative-velocity  Bounds  on  the  velocity  of  the  first  object  in  the  frame  of  the  second 
object. 

relative-force  Bounds  on  the  force  exerted  by  the  first  object  on  the  second.  The  gripper 
wall  almost  always  be  one  of  the  participants  in  this  relation  when  it  is  information¬ 
ful.  If  we  want  to  say  more  useful  things,  like  what  part  of  the  object  is  exerf ::>g 
force  on  the  gripper,  this  field  will  have  to  get  more  complicated. 

Finally,  the  recognition-data  data  type  will  have  the  following  fields: 

valid  A  Boolean  value  indicating  whether  the  remainder  of  the  record  is  valid.  This  allows 
the  recognition  component  to  not  fill  up  the  entire  set  of  records  if  it  doesn’t  have 
interesting  data  to  report. 

object-properties  A  value  of  type  property  giving  the  unary  properties  of  this  object. 

gripper-rel  A  value  of  type  relation  giving  the  relation  between  the  object  and  the 
gripper.  Order  is  important.  We  are  describing  the  object  in  the  gripper’s  frame.  If 
the  relations  are  invertible,  that’s  all  we  need  specify.  If  they  are  not,  we  will  have 
to  specify  the  relation  between  the  gripper  and  the  object,  as  well.  For  now.  assume 
that  they  are  inverses. 

camera-rel  A  value  of  type  relation  giving  the  relation  between  the  object  and  the 
camera. 

table-rel  A  value  of  type  relation  giving  the  relation  between  the  object  and  the  table. 

2.3.4  Database  to  Database  Projection 

On  each  tick  the  database  component  will  output  its  entire  contents  to  the  database  pro¬ 
jection  component.  The  database  will  be  a  Rex  module,  so  its  output  will  be  in  list  form. 
Specifically,  it  will  be  a  list  of  Boolean  and  floating-point  values  encoding  the  various 
properties  of  and  relations  between  the  objects  in  the  database. 

The  beginning  of  the  output  list  will  correspond  to  the  vector  in  the  database,  each 
cell  of  which  stores  characteristics  of  a  single  obJ_cl  (type  and  mass,  for  instance).  An 
object’s  type  will  be  encoded  by  a  vector  of  Booleans  whose  indices  correspond  to  the 
possible  types.  A  Boolean  in  the  vector  will  be  set  to  true  if  the  object  might  be  of  the 
type  associated  with  its  index.  For  example,  if  the  object  is  known  to  be  a  cup,  only  the 
cup  Boolean  will  be  true,  and  if  the  object  is  known  to  be  either  a  cup  or  a  gripper,  the 
Booleans  for  cup  and  gripper  will  be  true.  For  the  first  demo,  each  cell  of  the  database 
vector  will  contain  a  vector  of  five  Booleans  that  encode  an  object’s  type  and  one  Boolean 
indicating  whether  the  gripper  is  open  (only  relevant  for  grippers).  There  will  be  five  cells 
in  the  vector  to  begin  with,  so  the  first  section  of  the  database’s  output  will  be  30  Boolean 
values. 
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The  remainder  of  the  output  list  will  correspond  to  the  two-dimensional  array  in  the 
database  which  stores  the  relations  between  objects.  For  the  first  demo,  each  cell  [a,  b ]  of 
the  array  will  contain  twelve  floating-point  values  which  describe  the  position  of  object  a 
in  b's  frame  of  reference.  There  will  be  ten  useful  cells  in  this  array  at  first  (one  for  each 
possible  pair  of  objects),  so  this  part  of  the  output  list  will  have  120  floating-point  values. 

The  database  projection  component  will  read  the  values  from  the  database’s  output  list 
into  data  structures  with  the  same  form  as  those  in  the  database.  The  information  stored 
in  these  structures  will  then  be  used  to  answer  the  action  component’s  queries  about  the 
objects. 


2.3.5  Database  Projection  to  Goal  Strategies 

For  the  Level  1  demonstration,  the  interface  between  the  arm  control  action  component 
and  the  database  will  be  a  language  based  on  a  simple  epistemic  logic.  Expressions  in  the 
language  will  evaluate  to  Rex  circuitry  that  extracts  the  truth  value  of  desired  conditions 
or  object  indices  from  the  database.  In  addition,  there  will  be  a  set  of  functions  that 
extract  parametric  information  about  database  objects  from  their  indices. 

2.3.5. 1  Database  Language 

The  database  language  will  consist  of  a  small  set  of  Rex  functions  and  a  separate  language, 
used  by  these  functions,  that  can  generate  circuitry  that  accesses  the  database. 

Types:  The  database  language  uses  two  special  types. 

1.  obj s  are  indices  into  the  database  that  correspond  to  some  database  object. 

2.  k-conds  or  knowledge  conditions  represent  the  system’s  knowledge  about  a  par¬ 
ticular  condition.  They  can  have  three  values,  {1,0,-L},  that  roughly  correspond 
to  ‘known  to  be  true’,  ‘known  to  be  false’,  and  ‘unknown’. 

Terms:  Any  Rex  expression  of  type  obj  is  a  term.  There  is  only  one  primitive  function 
that  returns  terms. 

1.  (a  x  k-exp )  returns  an  obj  or  list  of  obj s.  X  is  an  atom  or  list  of  atoms,  and 
k-exp  is  a  knowledge  expression  as  described  below.  Intuitively  a  returns  an 
obj  or  a  set  of  obj s  from  the  database  that  satisfy  some  condition.  It  finds  the 
set  by  sequentially  binding  ti.e  atoms  in  x  to  the  obj s  in  the  database.  An  is 
equivalent  to  a. 

Condition  Expressions:  Condition  expressions  are  of  type  k-cond.  They  can  only  ap¬ 
pear  as  the  aigument  to  the  know  function,  described  below,  or  as  part  of  other 
condition  expressions.  The  bottom  level  condition  expressions  are  a  set  of  conditions 
that  can  be  directly  tested  on  the  database.  There  will  be  one  domain  independent 
condition. 
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1.  (equal  term  term)  tests  whether  the  two  terms  reference  the  same  database 
object,  i.e.  are  the  indices  equal. 

The  following  is  a  list  of  conditions  that  we  expect  to  implement  for  the  goals  of 
grasping  and  touching. 

1.  (grasping  term  term)  tests  whether  the  first  object  is  grasping  the  second 
object.  For  the  current  demo  this  test  can  only  be  1  or  i.  if  the  first  object  is 
the  hand;  if  the  first  object  is  anything  else,  it  must  be  0. 

2.  (touching  term  term)  test",  whether  the  first  object  and  the  second  object  are 
'  "niching. 

3.  xc  *n-grasp  term  term)  tests  whether  there  is  sufficient  space  around  the  sec¬ 
ond  object  for  the  first  object  to  grasp  it.  This  con  ition  has  the  same  con¬ 
straints  as  grasping. 

4.  (can-touch  term  term)  tests  whether  there  is  sufficient  space  around  the  sec¬ 
ond  object  for  the  first  object  to  touch  it. 

5.  (in  term  term)  tests  whether  the  first  object  is  entirely  within  the  bounds  of 
the  second  object. 

6.  (clear-betveen  term  term)  tests  whether  there  is  sufficient  space  between  the 
objects  such  that  the  first  object  can  move  so  it  is  touching  the  second  object. 

7.  (fully-above  term  term)  tests  whether  the  lowest  Z  coordinate  of  the  first 
object  is  above  the  highest  coordinate  of  the  second  object. 

8.  (above  term  term)  tests  whether  the  lowest  Z  coordinate  of  the  first  object  is 
above  the  lowest  Z  coordinate  of  the  second  object. 

9.  (xy-bounds- intersect  term  term)  tests  whether  the  projections  of  the  two 
objects  onto  the  table  intersect. 

10.  (table  term)  tests  whether  the  object  is  a  table. 

11.  (robot-hand  term)  tests  whether  the  object  is  a  robot  hand. 

12.  (human-hand  term)  tests  whether  the  object  is  a  human  hand. 

13.  (hand  term)  tests  whether  the  object  is  a  hand. 

14.  (pp-ball  term)  tests  whether  the  object  is  a  ping-pong  ball. 

15.  (cup  term)  tests  whether  the  object  is  a  cup. 

16.  (soda-can  term)  tests  whether  the  object  is  a  soda  can. 

17.  (freezer-container  term)  tests  whether  the  object  is  a  freezer-container. 

18.  (container  term)  tests  whether  the  object  is  one  of  cup,  soda-can  or 
freezer-container. 

19.  (open  term)  tests  whether  the  object  is  open.  Can  only  be  known  if  the  object 
is  a  container. 
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20.  (gripper-open  term)  test  if  the  object’s  gripper  is  open.  Can  only  be  known 
if  the  object  is  a  hand. 

21.  (moveable  term)  tests  whether  a  given  object  can  potentially  move.  It  is  deter¬ 
mined  just  from  the  type  of  the  object  and  hence  does  not  take  into  consideration 
whether  there  is  anything  currently  blocking  the  object  from  moving. 

22.  (has-opening  term)  tests  whether  a  given  object  has  an  opening.  This  condi¬ 
tion  is  also  determined  from  just  the  type  of  the  object.  Can  only  be  known  of 
the  object  is  a  container. 

Complex  Conditions  :  More  complex  conditions  can  be  formed  by  combining  other 
conditions  using  the  following  operators: 

1.  (not  p red-exp)  maps  1  into  0,  0  into  1  and  X  into  X. 

2.  (and  pred-exp  pred-exp)  returns  1  when  both  arguments  are  1,  0  when  at  least 
one  argument  is  0,  and  X  otherwise. 

3.  (or  pred-exp  pred-exp)  returns  0  if  both  arguments  are  0,  1  if  at  least  one 
argument  is  1,  and  X  otherwise. 

4.  (exists  x  pred-exp)  returns  1  if  wiere  is  an  obj  for  ea  h  atom  of  x  for  which 
pred-exp  is  1,  returns  0  when  pred-exp  is  0  for  all  bindings  of  x,  and  X  otherwise. 

5.  (exists-unique  x  pred-exp)  returns  1  if  there  is  exactly  one  binding  of  x  for 
which  pred-exp  is  1,  0  if  pred-exp  is  0  for  all  bindings  of  x  or  pred-exp  is  1  for 
more  than  one  binding  of  x,  and  X  otherwise. 

6.  (for-all  x  pred-exp)  returns  1  if  for  all  bindings  of  x  the  value  of  pred-exp  is 
1,  0  if  there  is  some  binding  of  x  for  which  pred-exp  is  0,  and  X  otherwise. 

Knowledge  Expressions:  Knowledge  expressions  are  Boolean-valued  Rex  expressions 
that  use  only  the  operators  andm,  orm,  notm,  and  know.  The  first  three  of  these  are 
just  the  standard  Rex  functions. 

1.  (know  pred-exp)  evaluates  pred-exp  and  then  maps  1  to  lb,  0  to  0b,  and  X  to 
0b.  Intuitively  it  tests  whether  pred-exp  is  ‘known  to  be  true’. 

2.3.5.2  And  Now  More  Forn.ally 

( extended-rex )  — ►  ( knowledge-expression )  \  (term)  |  (rex- expression) 

(kncwledge-  ession)  —* 

(andm  (kn  '.edge- expression)  ( knowledge-expression )) 

|  (orm  (knowledge-expression)  (knowledge- expression)) 

|  (notm  (knowledge- expression)) 

|  sknow  (condition- expression)) 

(condition- expression)  —* 
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(and  ( condition-expression )  { condition-expression )) 
|  (or  {condition- expression)  {condition- expression)) 
|  (not  ( condition-expression )) 

|  (exists  x  {condition-expression)) 

|  (exists-unique  x  { condition- expression )) 

|  (forall  x  {condition-expression)) 

|  (equal  {term)  {term)) 

|  {{ground- condition)  {extended-rex  )’  { term)+ ) 
{term)  — » 

(a  x  { knowledge-expression )) 

I  irexobj) 


where  x  is  an  atom  or  list  of  atoms,  {ground- conditions  are  a  set  of  Rex  functions  of 
type  k-cond ,  and  ( rex0bj)s  are  Rex  expressions  of  type  obj. 

2.3.5.3  Other  functions 

The  above  functions  only  return  k-conds  and  obj s.  There  will  also  be  a  set  of  Rex  functions 
that  extract  parametric  values  about  an  obj  from  the  database.  The  following  are  functions 
useful  for  touching  and  grasping. 

1.  (grasping-points  obj)  returns  three  object  relative  poses  at  which  the  object  can 
be  grasped. 

2.  (opening  obj)  returns  the  bounds  on  the  radius  of  the  given  object’s  opening  and 
the  vector  from  the  center  of  the  object  to  the  center  of  the  opening.  If  (has-opening 
obj)  is  false  then  the  function  is  undefined. 


2.3.5.4  Examples 

1.  An  object  that  is  known  to  be  held  by  a  hand. 

(an  object  (know  (grasping  (an  obj 2  (hand  obj 2))  object))) 

2.  The  object  with  the  greatest  highest  Z  coordinate  excluding  the  hand  and  any  object 
the  hand  is  holding. 


(an  object  (know  (for-all  obj 2 

(or  (above  object  obj 2) 

(grasping  (am  obj3  (know  (hand 


object))) 


obj 2))))) 


3.  The  object  with  the  greatest  highest  Z  coordinate  that  is  above  the  given  object  and 
whose  X  and  Y  bounds  intersect  with  the  given  object. 
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(defun  highest-object-above  (object) 

(an  obj2  (andm  (know  (XY-bounds-intersect  object  obj2)) 

(know  (above  obj2  object)) 

(notn  (know  (exists  obj3  (above  obj3  obj2) )))))) 

4.  The  object  that  the  given  object  is  contained  within. 

(defun  surrounding-object  (object) 

(an  obj2  (know  (in  object  obj2)))) 

2.3.6  Goal  Strategies  to  Manipulation 

The  consensus  is  that  these  two  components  are  too  intertwined  to  make  it  useful  to 
specify  an  interface  between  them  in  advance.  As  the  demo  tasks  become  more  complex, 
this  division  may  be  more  useful. 

2.3.7  Manipulation  to  Arm  Control 

2.3.7.1  Overview 

REX  can  talk  to  the  ZERO  arm  over  the  Ethernet  via  pseudo-UDP  packets.  A  com¬ 
mand  packet  contains  a  few  bytes  of  command  information  followed  by  some  command 
parameters.  They  are  fixed  length  packets. 

Once  a  packet  is  received  by  the  ZERO  controller,  it  begins  execution  of  the  command 
(if  possible),  and  sends  back  a  status  packet.  The  status  packet  is  parallel  to  the  command 
packet  in  that  it  has  a  few  bytes  of  status  information  followed  by  the  arm  data. 

2. 3. 7.2  Packet  Descriptions 

2. 3.7. 2.1  Command  Packet  The  command  packet  consists  of  53  bytes  described  in 
the  table  below: 

Byte# 

0 
1 
2 
3 


4-27 
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Descnption 

First  header  byte  -  arbitrarily  set  to  the  char  ’A’ 

Second  header  byte  -  set  to  the  char  ’B’ 

Command  byte  -  describes  one  of  the  N  possible  commands 
Output  byte  -  commands  digital  devices  to  be  on  or  off 
Bit  #  Device  /  State 

0  Hand  -  0  =  closed,  1  =  open 

1-7  Unused 

Commanded  hand  position  given  as  a  6x1  XYZ/Angle- 

Axis  vector  or  as  a  6x1  joint  angle  vector  (Pi  -  P6): 

6  numbers  encoded  as  4  byte  floats,  LSB  first.  UNIX 
and  the  PC  use  the  same  IEEE  standard  floats,  but  the 
SUNs  store  floats  MSB  first  whereas  the  PC,  the  DEC 
machine  and  the  lisp  machine  all  store  floats  LSB  first. 

The  network  standard  is  to  send  floats  MSB  first,  but 
we  are  sending  the  LSB  first,  anyway. 


28-51  Force  command  parameters  (FI  -  F6):  3  forces  along 
the  XYZ  gripper  axes  and  3  torques  about  the  XYZ 
gripper  axes  encoded  as  4  byte  floats,  LSB  first. 

52  Bytewise  checksum  -  {  [  sum(bytes  0-51)  ]  modulo  256  } 

The  first  two  bytes  and  the  last  byte  are  not  really  used  for  anything  -  they  are  left 
over  from  the  serial  communication  days.  We  leave  them  in  just  in  case,  we  have  to  go 
back  to  serial  communication. 

2.3.7.2.2  Status  Packet  The  status  packet  also  consists  of  53  bytes  which  are  de¬ 
scribed  in  the  table  below: 

Byte  #  Description 

0  First  header  byte  -  arbitrarily  set  to  the  char  ’A’ 

1  Second  header  byte  -  set  to  the  character  ’B’ 

2  Status  byte  with  the  bit  definitions: 

Bit  #  Description 

0&1  Command  status: 

0  =  last  command  accepted  and  completed 

1  =  execution  of  last  command  in  progress 

2  =  last  command  failed  for  some  reason 

3  =  bad  command  number  in  last  packet 

2  1  =  Bad  checksum  (not  currently  used),  0  =  OK 

3  1  =  Arm  is  in  motion,  0  =  Arm  stationary 

4  1  =  Last  move  command  was  aborted  due  to  excessive  forces 

0  =  OK 

5  1  =  Command  parameters  out  of  range,  0  =  OK 

6  1  =  Arm  not  homed,  0  =  Arm  ready  to  go 

7  1  =  Gripper  open,  0  =  gripper  closed 

3  Input  byte  -  input  from  digital  sensors 
Bit  #  Device  /  State 

0-7  Unused 

4-27  Arm  joint  angles  -  Joint  angles  1  -  6  in  radians  encoded 
as  4  byte  floats,  LSB  first. 

28-51  3  forces  along  the  XYZ  gripper  axes  and  3  torques 

about  the  XYZ  gripper  axes  encoded  as  4  byte  floats, 

LSB  first. 

52  Bytewise  checksum  -  [sum(bytes  0-51)]  modulo  256 

Again,  the  header  bytes  and  the  checksum  are  not  used,  but  are  left  for  reverting  to 
the  serial  communications. 

2.3.7.3  Receiving  and  Executing  Commands 

The  general  flow  of  operation  is  as  follows:  The  ZERO  controller  waits  for  an  ethernet 
packet  to  arrive.  When  any  packet  arrives,  it  assumes  it  is  an  arm  command,  and  begins 
processing  the  command,  setting  the  appropriate  status  flags  as  needed. 
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Some  commands  are  initiated,  but  will  not  be  completed  before  the  next  command 
arrives.  This  situation  is  indicated  by  the  first  two  status  bits  in  the  status  byte.  Most 
commands,  however,  will  be  completed  in  the  initial  processing. 

After  a  command  is  processed,  a  status  packet  is  returned  which  reflects  the  outcome 
of  that  commands’  processing.  The  controller  then  waits  for  the  next  command  packet. 
Commands  are  double  buffered  as  they  come  in  over  the  ethemet.  When  ready  to  process 
a  new  command,  it  is  the  most  recent  command  received  which  is  copied  into  a  working 
buffer  for  processing.  If  command  packets  come  in  too  fast  (at  more  than  about  15  hz). 
some  of  them  will  be  dropped.  There  is  currently  no  indication  that  a  packet  has  been 
dropped. 


2.3.7.4  Command  Descriptions 

The  commands  encoded  by  the  command  byte  fall  into  roughly  the  two  catagories  of 
setting  control  parameters,  or  starting  or  modifying  some  arm  motion.  These  commands 
are  accompanied  by  parameters  passed  in  the  position  and  force  command  vectors  in  the 
rest  of  the  command  packet. 

In  specifying  the  arm  configuration,  the  position  vector  must  represent  the  position  of 
the  hand  in  the  world  in  some  particular  set  of  coordinates.  For  some  commands,  arm 
positions  are  commanded  as  Cartesian  coordinates  specifying  the  position  and  orientation 
of  the  gripper  relative  to  a  base  frame  which  is  fixed  relative  to  the  table. 

The  gripper  frame  is  encoded  as  a  6x1  position  /  angle-axis  vector.  The  first  three 
elements  of  the  vector  are  the  X,  F,  and  Z  coordinates  of  the  origin  of  the  gripper  relative 
to  the  base  frame  in  millimeters.  The  second  three  elements  represent  a  vector  in  the 
base  frame,  such  that  if  we  rotate  the  base  frame  about  this  vector,  we  will  get  the  same 
orientation  as  the  gripper  frame.  The  amount  that  we  need  to  rotate  about  this  vector  is 
given  by  its  length  in  radians. 

Algorithms  for  converting  back  and  forth  between  rotation  matrices  and  angle-axis 
vectors  are  given  in  John  Craig’s  robotics  book  in  Chapter  2.  One  special  case  which  he 
leaves  as  an  exercise  to  the  reader  is  for  converting  to  the  angle- axis  form  when  the  rotation 
angle  is  180  degrees.  If  the  rotation  matrix  is  given  as  R,  the  rotation  vector  for  180  degree 
case  is  simply  the  sum  of  the  columns  of  the  matrix  ( R  + 1).  This  vector  should  be  scaled 
to  have  length  tc. 

Other  arm  commands  specify  the  configuration  of  the  arm  directly  in  joint  angles.  Note 
that  the  position  vector  returned  in  the  status  packet  is  always  in  joint  angle  coordinates. 
The  forward  kinematic  function  of  the  arm  must  be  invoked  to  convert  the  joint  angles 
into  a  Cartesian  frame. 

All  of  the  commands  available  to  in  the  REXARM  program  are  detailed  below: 

2.3.7.4.1  Parameter  Setting  Commands 

0  NOP  -  good  for  waiting  for  something  to  happen. 

1  Robot  initialization  -  turns  the  servos  on  and  declares  that  the  arm  is  in  the  home 
position.  Should  only  be  used  in  homing  operations. 
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2  Robot  reset  -  turns  the  servos  off  and  forgets  where  the  arm  is. 

5  Zero  force  sensor  -  declare  a  new  zero  position  for  the  force  sensor  for  force  thresholded 
moves. 

7  Set  zero  position  -  tells  the  arm  what  its  joint  angles  are  when  it  is  positioned  in  the 

nest.  Use  only  in  homing  the  robot.  Home  position  is  specified  in  joint  angles  in  the 
position  command  vector. 

8  Set  speed  -  a  single  joint  speed  parameter  between  0.0  and  1.0  stored  in  Pj,  the  first 

element  of  the  position  vector.  Smaller  is  slower,  bigger  is  faster. 

9  Set  acceleration  -  a  single  joint  acceleration  parameter  between  0.0  and  1.0  stored  in  P\. 

the  first  element  of  the  position  vector.  Smaller  is  slower,  bigger  is  faster. 

15  Float  -  turn  the  motors  off  while  continuing  to  monitor  joint  positions.  Currently,  the 

correct  position  is  not  returned  in  the  status  packet  until  a  move  command  is  issued. 
A  freeze  command  should  be  issued  before  issuing  any  move  commands. 

16  Freeze  -  turn  the  motors  on,  servoing  to  the  arm’s  current  position. 

127  Halt  execution  of  the  program.  The  arm  will  automatically  go  back  home. 

2.3.7. 4.2  Fixed  position  commands  These  move  commands  are  initialized,  but  will 
not  be  completed  for  several  ticks.  If  one  of  these  commands  is  issued,  communication  will 
cease  until  the  move  is  finished.  These  commands  are  not  generally  useful  for  REX  style 
programming  but  will  work  as  long  as  the  REX  program  is  content  to  live  with  old  data 
until  the  move  is  finished. 

10  Joint  space  move  -  move  the  arm  to  the  joint  angles  specified  in  Pj  -  P6. 

11  Relative  joint  space  move  -  increment  the  arm  joint  angles  by  the  amounts  specified  in 

Pi -Pt- 

12  Single  joint  move  -  moves  the  joint  1-6  (specified  as  1.0  -  6.0  in  Pi)  to  a  specified  angle 

(in  degrees  in  P2) 

17  Add  wobble  -  sets  the  magnitude  of  a  wobble  to  be  superimposed  on  the  three  wrist 

joints  for  all  subsequent  fixed  position  move  commands.  Magnitude  in  degrees  is 
specified  in  Pi  -  P3. 

18  Turns  the  wobble  off. 

19  Add  joint  space  via  point  -  specify  the  next  point  in  a  continuous  path  in  joint  cooi- 

dinates  P\  -  Pe- 

20  Translate  gripper  -  translate  the  gripper  by  the  X,Y,Z  increments  specified  in  Pi  - 

P3  without  changing  the  orientation.  Coordinates  are  in  millimeters  in  base  frame 
coordinates. 
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21  Hand  relative  translation  -  Same  as  command  20,  but  the  coordinates  are  specified 

relative  to  the  hand  coordinate  frame. 

22  Hand  rotation  -  rotates  the  hand  about  the  axis  specified  by  Pi  -  P3  (in  base  coordi¬ 

nates)  by  the  angle  specified  by  P<  (in  degrees).  The  fingertips  will  not  translate. 

23  Hand  relative  rotation  -  same  as  command  22,  but  the  axis  is  specified  in  hand  coor¬ 

dinates. 

25  Cartesian  move  -  move  the  gripper  to  the  Cartesian  coordinates  P\  -  Pq  encoded  as  an 

XYZ/angle-axis  vector. 

26  Add  Cartesian  via  point  -  same  as  command  19,  but  the  via  point  is  specified  with  an 

AfyZ/angle-axis  vector. 

27  Run  path  -  executes  the  continuous  path  set  by  the  previously  specified  via  points. 

2.3.7.4.3  Variable  Position  Moves  These  move  commands  specify  a  single  goal  po¬ 
sition  which  can  be  modified  continuously.  When  one  of  these  commands  is  issued,  the 
arm  will  start  moving  towards  the  goal  point,  the  controller  will  return  a  status  packet 
and  then  look  immediately  for  the  next  command.  New  goal  positions  can  be  sent  whether 
the  arm  has  reached  the  goal,  or  not.  These  are  the  commands  typically  used  by  REX 
programs. 

28  Cartesian  move  -  move  the  gripper  to  the  Cartesian  coordinates  P\  -  Pe  encoded  as 

an  XYZ/angle-axis  vector.  A  command  29  should  be  issued  before  giving  any  other 
kind  of  move  command. 

29  Halt  move  -  take  the  arm  out  of  variable  position  move  mode  and  servo  to  the  current 

position  of  the  arm. 

One  last  detail  in  the  command  section  is  the  output  byte.  This  is  used  to  turn  on  or 
off  binary  devices.  The  only  device  currently  used  is  the  gripper,  which  is  commanded  by 
bit  0. 

2.3. 7. 5  Ethernet,  IP  and  UDP  Particulars 

The  PC  communicates  over  the  ethernet  using  a  3COM  503  ethernet  adapter  and  the 
KA9Q  public  domain  device  driver.  This  device  driver  sends  and  receives  raw  ethernet 
packets.  Unfortunately,  the  rest  of  the  REX  related  machinery  insists  upon  using  several 
layers  of  communication  protocols  above  the  et  ’t  level.  Communication  takes  place 
through  UDP  packets,  which  are  then  bundled  aj  if  packets,  which  are  then  bundled  as 
ethernet  packets. 

REXARM  uses  none  of  the  features  of  the  IP  or  UDP  protocols.  If  it  receives  a  packet 
with  its  ethernet  address,  it  strips  away  all  of  the  header  information,  assuming  the  packet 
is  an  arm  command. 


Sending  status  packets  back  to  the  REX  machine,  however,  is  difficult,  because  the 
packets  must  contain  IP  and  UDP  headers  which  are  realistic  enough  to  fool  the  UNIX 
communication  software. 

The  outer  layer  of  headers  is  the  ethernet  header  containing  the  ether  destination 
address,  ether  source  address,  and  type  code  (here,  we  use  0x0800  to  indicate  that  the 
next  lower  layer  is  an  IP  packet).  An  ethernet  checksum  is  tacked  onto  the  end  of  the 
ethernet  packet  by  the  ethernet  hardware. 

The  IP  header  starts  off  with  version,  IHL  and  type  of  service  information,  all  copied 
verbatim  from  other  similar  packets  on  the  net.  (I  don’t  know  what  these  things  mean, 
but  simply  hardwiring  their  values  seems  to  work  for  now.)  Next  is  the  IP  packet  length, 
and  identification.  The  packet  length  is  the  length  of  the  IP  packet  in  bytes,  and  the 
identification  we  arbitrarily  set  to  the  number  of  packets  sent.  The  flags,  fragment  offset, 
and  time  to  live  fields  are  again  just  hardwired  to  similar  values  seen  on  the  net.  The 
protocol  is  set  to  0x11  for  UDP  packets,  and  the  IP  header  checksum  is  calculated  through 
some  completely  convoluted  means.  The  last  things  to  deal  with  are  the  IP  source  and 
destination  addresses. 

The  next  level  header,  the  UDP  header,  is  quite  a  bit  simpler  -  it  contains  the  UDP 
source  and  destination  port  numbers,  the  UDP  packet  length,  and  a  UDP  checksum.  The 
UDP  checksum  is  simply  set  to  0,  but  nobody  seems  to  care. 

The  ethernet  addresses,  IP  addresses  and  UDP  port  numbers  of  all  the  relevant  ma¬ 
chines  are  listed  below: 

zero_enet_addr[]  *  {  0x02,  0x60,  0x8c,  0x0c,  0x15,  0x3a  >; 
zero.ip.addr []  *  {  OxcO,  0x2a,  0x08,  0x16  >; 


sherman_enet_addr []  * 
sherman_ip_addr[]  * 
sherman. udp.port  []  * 

vayback_enet_addr[]  •* 
vayback_ip_addr[]  = 
wayback.udp.port []  * 

boris_enet_addr[]  *  { 
boris.ip.addr []  *  { 

boris.udp.port []  *  { 


{  0x08,  0x00,  0x20, 
{  OxcO,  0x2a,  0x08, 
{  0x06,  0x40  >; 

{  0x08,  0x00,  0x20, 
{  OxcO,  0x2a,  0x08, 
{  0x05,  0xc2  >; 

0x08, 

OxcO, 

0x06, 


0x01,  0x22,  Oxce  >; 
0x02  >; 


0x01,  0x01,  0x2e  >; 
0x04  >; 


0x00,  0x2B,  OxOF,  OxOE,  0x41  >; 
0x2a,  0x08,  0x17  >; 

0x40  }; 


2.3.7.6  Owning  and  Operating  REXARM 

The  r  •  thing  to  remember  for  safe  robot  operation  is  to  turn  the  power  on  only  when 
REXaRi.  mpts  you,  and  turn  the  power  off  whenever  things  look  dicey. 

Before  turning  on  the  PC,  first  make  sure  that  the  two  40  pin  and  one  10  pin  flat 
ribbon  cables  are  plugged  in,  as  well  as  the  power  connector  and  the  air  hoses.  All  these 
connections  are  made  in  the  metal  box  next  to  the  arm.  Open  up  the  valve  on  the  air  tank 
to  make  sure  there  is  air  for  the  gripper.  The  gauge  should  read  between  40  and  60  psi. 
Fill  up  the  tank  if  needed.  Make  sure  the  arm  is  placed  in  its  nest. 
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With  everything  in  place,  turn  on  the  PC  and  move  to  the  TELEOS  directory.  Once 
there,  type  REXARM2  to  start  up  the  current  version  of  the  program.  If  you  just  want  to 
test  a  REX  program  without  actually  moving  the  arm,  indicate  so  at  the  prompt.  When 
asked  for  the  name  of  the  REX  machine  type  ’boris’  or  ’wayback’,  depending  on  which 
machine  is  being  used.  Finally,  when  prompted,  push  the  green  power  button  on  the  arm 
power  supply  and  turn  on  the  air.  (Don’t  bother  if  you  indicated  that  you  don’t  want  to 
use  the  arm.)  From  here  on  out,  it  is  a  good  idea  to  hold  onto  the  remote  kill  button. 

After  hitting  a  key  to  acknowledge  that  power  is  enabled,  the  arm  will  go  through  its 
homing  procedure.  The  arm  will  then  move  to  the  ready  position,  and  is  ready  to  accept 
ethernet  packets.  As  commands  are  sent,  packet  information  will  be.  printed  on  the  screen. 
Note,  however,  that  while  a  variable  position  move  is  in  progress,  nothing  will  be  printed 
to  the  screen  until  a  command  29  is  issued. 

At  any  time,  a  ’q’  typed  on  the  keyboard  will  halt  the  arm,  send  it  back  to  its  home 
position,  and  terminate  the  program.  The  same  thing  happens  when  a  127  command  is 
received.  After  the  program  halts,  turn  off  the  power  supply  with  the  red  button  or  the 
remote  switch.  Don’t  forget  to  turn  off  the  air  valve. 

2.3.7.7  Known  Bugs,  Glitches  and  Deficiencies 

1.  Robot  reset  and  initialization  functions  do  not  set  and  reset  the  arm  homed  flag. 

2.  Communication  halts  during  fixed  position  commands.  These  commands  should  be 
made  interruptible. 

3.  Current  positions  and  forces  are  not  returned  while  floating. 

4.  There  is  no  prompt  for  turning  off  the  power. 

5.  After  the  arm  goes  back  into  the  nest,  periodically  a  math  floating  point  error  shows 
up.  If  this  is  the  case,  turn  off  the  arm  power  and  reboot  the  PC  for  good  measure. 

6.  There  is  no  function  for  setting  force  thresholds  on  moves. 

7.  If  a  variable  position  command  is  aborted,  another  move  command  must  be  issued 
to  detect  the  abort  condition.  A  command  29  will  wipe  out  that  information. 

8.  There  is  no  function  for  changing  the  gi  'pper  frame. 


34 


